Not All AI Algorithms for Detecting Diabetic Retinopathy Perform Equally

Author(s):

An analysis of 7 AI algorithms for detecting diabetic retinopathy is shedding light on variability in outcomes using the various algorithms that have become available in recent years.

Aaron Lee, MD, MSc

A real-world validation study examining more than half a dozen artificial intelligence (AI) algorithms for detecting the presence of diabetic retinopathy suggests such an approach could lead to inconsistent results.

An analysis of more than 300,000 images from diabetic retinopathy screenings suggests a high degree of negative predictive values and a high level of variability when examining the sensitivity of these AI algorithms.

“Two automated artificial intelligence (AI)-based DR screening algorithms have FDA approval. Several others are under consideration while in clinical use in other countries, but their real-world performance has not been evaluated systematically,” wrote investigators.

In recent years, multiple companies have produced products to create avenues for diabetic retinopathy screenings in primary care settings, including 2 that have received approval from the US Food and Drug Administration. With an interest in comparing these algorithms and others in use throughout the world, Lee and a team of colleagues designed their study as a multicenter, head-to-head, real-world validation study of 7 diabetic retinopathy screenings systems.

For the purpose of their study, investigators obtained data related to imaging from 23,724 veterans who presented for teleretinal diabetic retinopathy at the Veterans Affairs (VA) Puget Sound Health Care System or Atlanta VA Health Care System from 2006-2018. From the 23,724 veterans, investigators obtained a total of 311,064 images.

In total, 5 companies provided 7 AI-based algorithms for the analysis. These companies were Eyenuk, Retina-AI Health, Airdoc, Retmarker, and OphtAI. When comparing algorithms against grading by human screeners, investigators found 2 algorithms achieved greater sensitivities than human screeners and one achieved comparably sensitivity (80.47%, P=.441) and specificity (81.28%, P=.195).

Overall, investigators found high negative predictive values were observed and sensitivities varied widely across the algorithms examined in the study (50.98-85.90%). Investigators also noted one algorithm had lower sensitivity for proliferative diabetic retinopathy than the human screeners. Additional analyses indicated the value per encounter varied at $15.14-18.06 for ophthalmologists and $7.74-9.24 for optometrists.

"It's alarming that some of these algorithms are not performing consistently since they are being used somewhere in the world," said lead researcher Aaron Lee, MD, MSc, assistant professor of ophthalmology at the University of Washington School of Medicine, in a statement from the University of Washington.

This study, “Multicenter, Head-to-Head, Real-World Validation Study of Seven Automated Artificial Intelligence Diabetic Retinopathy Screening Systems,” was published in Diabetes Care.