News
Article
Author(s):
Although the study concluded AI had equal or lower rates of critical misses than radiology reports of remarkable chest radiographs, AI had greater critical miss rates.
A commercial artificial intelligence (AI) tool has the potential to rule out unremarkable pathology in between 24.5% and 52.7% of chest X-rays, a new study found.1
Radiologists typically look at a high volume of unremarkable chest radiographs. However, with the use of AI, they will no longer have to look at a bunch of normal radiographs and can focus on looking for irregularities.
“Our group and others have previously shown that AI tools are capable of excluding pathology in chest X-rays with high confidence and thereby provide an autonomous normal report without a human-in-the-loop,” said lead investigator Louis Lind Plesner, MD, from the department of radiology at Herlev and Gentofte Hospital in Copenhagen, Denmark, in a press release. 2 “Such AI algorithms miss very few abnormal chest radiographs. However, before our current study, we didn’t know what the appropriate threshold was for these models.”
Investigators sought to determine how well an AI tool can correctly rule out unremarkable chest radiographs and how its performance compares to that of radiologists.1 In their retrospective study, the team examined the consecutive chest X-rays taken from January 1 to January 12, 2020, at four hospitals in Denmark. Excluded data included insufficient radiology reports or AI output errors.
The study included 1961 patients aged ≥ 18 years. The mean age of the same was 72 years (IQR, 58 – 81 years), and 993 were female. Each patient had 1 chest radiograph analyzed.
Two thoracic radiologists, who did not know what the AI had said, reviewed the X-rays and labeled them as either “remarkable” (having something noteworthy) or “unremarkable” (normal). They used specific criteria to decide what counted as unremarkable.
A commercial AI tool was created to determine the probability of remarkability in a chest radiograph. This score was then used to measure how specific the AI was at different sensitivity levels. The AI tool had an area under the receiver operating characteristic curve of 0.928 (95% confidence interval [CI], 0.917 – 0.939).
The radiologists then reviewed radiographs where the AI or radiology reports missed findings and classified these missed findings as critical, clinically significant, or clinically insignificant. Afterward, investigators compared the performance of AI and radiologists using the McNemar test.
The radiologists labeled 1231 of 1961 chest radiographs (62.8%) as remarkable and 730 of 1961 as unremarkable (37.2%). The AI tool was able to correctly rule out 24.5%, 47.1%, and 52.7% at 99.9% (1230 of 1231 radiographs; 95% CI, 99 – 100), 99% (1219 of 1231 radiographs; 95% CI, 98 – 99), and 98% (1207 of 1231 radiographs; 95% CI, 97 – 99) sensitivity.
At the 99% threshold, AI had a sensitivity of 97.5% for chest radiographs with only 1 remarkable finding (355 of 364; 95% CI, 95 – 99), 99.1% for 2 remarkable findings (338 of 341; 95% CI, 97 – 100), and 100% for ≥ 3 remarkable findings (544 of 544; 95% CI, 99 – 100).
Having a CT image (51.3%) significantly improved the AI’s ability to identify unremarkable radiographs, compared to when no CT image was available (38.8%) (P < .001). However, there was no difference in sensitivity when a CT image was (98.9%) and was not available (99.1%) (P = .79).
Human radiology reports had a sensitivity of 87%, therefore missing 12.8% of critical chest radiographs. However, AI had a greater critical miss rate—among the 158 missed chest radiographs, 17.1% were critical (P = .01) and 32.3% were clinically significant (P = .46). Half of the chest radiographs the AI missed were clinically insignificant (P = .11).
The AI tool missed 6.3% of critical or clinically significant misses of remarkable chest radiographs (78 out of 1231), compared to 4.7% of missed significant radiographs by radiologists. Overall, the AI had 2.2% critical misses and 4.1% clinically significant misses versus 1.1% (P = .01) and 3.6% (P = .46) misses by radiologists, respectively
Investigators noted AI mistakes were often more clinically severe for the patient than radiologist mistakes.
“This is likely because radiologists interpret findings based on the clinical scenario, which AI does not,” Plesner said.2 “Therefore, when AI is intended to provide an automated normal report, it has to be more sensitive than the radiologist to avoid decreasing standard of care during implementation.
References