Popis |
The rapid advancement of artificial intelligence (AI) in medical imaging has presented an exciting prospect of enhancing diagnostic accuracy and efficiency. One of the active areas of research is the use of deep-learning-based automatic detection algorithms (DLAD) in chest radiography, which has shown tremendous potential in identifying various findings such as tuberculosis or pulmonary lesions. However, despite the promising results in the controlled, high-prevalence simulated conditions typically observed in research settings, there are concerns about the use of these applications in real-world scenarios. For our study, we collected 956 chest X-ray images (CXR) from daily clinical practice at a municipal hospital. Two central readers with access to the patient’s previous and subsequent examinations achieved blinded agreement for 901 CXRs, of which 21 were visually confirmed to contain one or more pulmonary lesions (prevalence: 2.3%) and 880 were found to contain no pulmonary lesions. Six radiologists of varying expertise were asked to conduct a retrospective analysis of these images. Subsequently, the performance of each radiologist was benchmarked against the ground truth and the proposed DLAD (2.0.20-v2.01). The proposed DLAD demonstrated higher sensitivity (Se of 0.905 (0.715–0.978)) than that of all assessed radiologists (RAD 1 0.238 (0.103-0.448), p < 0.001, RAD 2 0.333 (0.170-0.544), p < 0.001, RAD 3 0.524 (0.324-0.717), p < 0.001, RAD 4 0.619 (0.410-0.794), p < 0.001, RAD 5 0.667 (0.456-0.83), p < 0.001, RAD 6 0.619 (0.41-0.794), p < 0.001), and the difference was statistically significant. The DLAD specificity (Sp of 0.893 (0.871-0.912)) was significantly lower than that of five compared radiologists (RAD 1 0.999 (0.994-1), p < 0.001, RAD 2 0.933 (0.915-0.948), p < 0.001, RAD 4 0.968 (0.955-0.978), p < 0.001, RAD 5 0.991 (0.982-0.996), p < 0.001, RAD 6 0.989 (0.979-0.994), p < 0.001), with the exception of one, mid-level experienced radiologist but the difference was not statistically significant (RAD 3 0.884 (0.861-0.904), p = 0.685). The results of this study demonstrate that the proposed DLAD achieves a high level of sensitivity and a relatively reliable level of specificity even when applied in low-prevalence real-world settings. As a result, the proposed DLAD can be considered beneficial for both junior and more experienced radiologists.
|