Article
Author(s):
A direct comparison of 3 independent AI systems showed differing prevalences of diabetic retinopathy, suggesting inconsistencies when applying these tools to those without preexisting diabetes.
A comparison of 3 artificial intelligence platforms showed differing prevalence of diabetic retinopathy, suggesting inconsistencies when applying these platforms to detect retinopathy in patients not known to have diabetes.1
The data, presented at the 83rd Scientific Sessions of the American Diabetes Association (ADA 2023), were obtained from the first direct comparison of 3 independent artificial intelligence companies when simultaneously assessing retinal images from ≥10,000 individuals.
“Whilst artificial intelligence performs well in diagnosing moderate retinopathy, its ability to diagnose mild retinopathy is unreliable,” wrote the investigative team, led by Alexandra E. Butler, MBBS, PhD, Royal College of Surgeons in Ireland.
Butler and colleagues determined a glycemic threshold of 6.5% for the diagnosis of type 2 diabetes (T2D), based on the prevalence of moderate retinopathy. However, they noted that diabetes-related complications could occur before that threshold.2 Their analysis used 3 companies to analyze retinal images from subjects not known to have diabetes using their commercial software and returned data on the absence or presence of features of any diabetic retinopathy on a per-eye basis.1
Ophthalmic retinal images from a total of 11,449 patients not known to have diabetes from the Qatar Biobank were assessed for features of diabetic retinopathy. Both the right and left eye were included, totaling 22,898 images. The analysis used 3 Confirmite Europenne (CE)-marked independent artificial intelligence software systems.
The analysis used Krippendorff’s Alpha Reliability Estimate to assess agreements among more than 2 raters (e.g., all 3 artificial intelligence companies) and used Cohen’s Kappa coefficient to compare results between the 2 raters. In order to indicate good agreement between rates, investigators noted both Krippendorff’s Alpha and Cohen’s Kappa should be ≥0.7.
From the 11,449 subjects in the analysis, diabetic retinopathy was recorded as absent in both eyes in 7,094 (62.0%) patients by all 3 artificial intelligence methods (test Negative at patient level). The analysis showed diabetic retinopathy was detected by 2 or 3 of the artificial intelligence systems in ≥1 eye in a total of 2,408 (21.0%) patients (test Positive at patient level). Of this subpopulation, in 1,532 (13.4% cases, diabetic retinopathy was detected in only 1 eye of any given patient; in 876 (7.6%) cases, diabetic retinopathy was detected in both eyes of a patient.
Overall data showed 1,947 (17.0%) patients had ≥1 eye reported as assessable by 2 or 3 of the artificial intelligence systems. Thus, no overall result for diabetic retinopathy features could be obtained from these patients.
Investigators indicated results from the 3 artificial intelligence companies were markedly different. Data showed 17.7%, 32.9%, and 46.6% ranked Test-Positive results at the patient level. Only 304 right eyes and 380 left eyes were positive for all 3 companies, they indicated.
The team tested a sample of 5,381 patients, finding Krippenderoff’s alpha was 0.117, suggesting very low agreement across all 3 raters (level of agreement between AI #1 and A1 #2: Cohen’s Kappa, 0.180; level of agreement between AI #1 and AI #3: Cohen’s Kappa, 0.150; level of agreement between AI #2 and AI #3: Cohen’s Kappa, 0.032).
“Intensive training of the algorithm is needed before artificial intelligence can be utilized for the early diagnosis of mild retinopathy,” the investigative team wrote.
References