Algorithm Detects Diabetic Retinopathy in Retinal Images with 97% Accuracy

Author(s):

The deep learning algorithm was tested on 71,000 images, and offers the potential to significantly increase the speed at which DR can be identified.

A novel artificial intelligence-based deep learning algorithm developed by researchers at the Center for Eye Research Australia (CERA) in Melbourne, Victoria can detect referable diabetic retinopathy (DR) from retinal images with 97% accuracy, according to new research released at the 2018 meeting of the Association for Research in Vision and Ophthalmology (ARVO).

Stuart Keel PhD

Stuart Keel, PhD

The cloud-based, convolutional neuron network (CNN) deep learning system offers the potential to significantly increase the efficiency of DR screening by increasing the speed at which DR can be identified in relevant images, and by reducing the amount of time spent poring through retinal images that are not affected by the condition, said Stuart Keel, PhD, a post-doctorate research fellow at CERA.

“Also, there’s great potential to provide greater accessibility of DR screening,” Keel said during a presentation at the Hawaii Convention Center on April 29. “Particularly in those low resource areas such as developing nations, regional remote areas, and in particular countries and minority populations.”

Keel and colleagues at CERA tested the algorithm’s capability to identify referable DR (defined as greater than or equal to pre-proliferative DR and/or diabetic macular edema) on a set of more than 71,000 non-stereoscopic retinal images collected from various clinical settings in China.

The group recruited 21 ophthalmologists who had passed strict inclusion criteria by admitting a high level of agreement with CERA’s specialists on a subset of 200 images, who graded the images according to the NHS DR screening classification system. Each image was first assigned to an individual grader, then sequentially assigned to other individual graders until 3 consistent grading outcomes were achieved. “This was assigned as the gold standard grading for each particular image,” Keel said.

The resulting dataset was then split into a larger training dataset that was used to train the deep learning algorithm, as well as a smaller validation set that was used to assess internal performance. Moreover, CERA researchers validated the algorithm in an external independent dataset to ensure that overfitting of the algorithm was ruled out, and that the algorithm was generalizable to other datasets.

“We collected over 35,000 images from 3 population based studies,” Keel said: The Natonal Indigenous Eye Health Survey of Australia; the Singapore-Malay Eye Study; and the AusDiab Study. “We suggested that this was relatively close to real-world representation, in that we had a range of ethnicities. With ethnicities, we know there’s quite a strong variation in fundus pigmentation, which is a potential source of error for these deep learning algorithms. And we also recognized, in the screening setting, the quality of images as impacted by a number of factors, including pupillary dilation.”

The 3 studies had a good variation in imaging protocols, Keel added, ranging from dilation of all patients in the Singapore-Malay study, to no dilation in the AusDiab study.

Next, Keel and colleagues developed 4 deep learning models: one for referable DR, one for DME, one for classification for image quality as gradable or ungradable, and one to identify image gradation as a macula or disc-centered image. When entered into the algorithm, the images first undergo preprocessing for normalization, and are then sequentially filtered through probability distributions for DR.

Images in the presentation (which were not released publicly by ARVO and could not be photographed because of the association’s prohibitive recording restrictions) showed that the area under curve for both DR and DME ran above 0.9, and the total combined dataset generated specificity metrics above 90% and an area under curve of .95, “giving us confidence that the area under curve was quite generalizable to other datasets, different ethnicities and different imaging protocols,” Keel said.

Cases of false-positive identification displayed other retinal pathology, “suggesting that in fact, a fair majority of cases may have benefitted from referral,” Keel said. “Undoubtedly, the most common cause of false positives was mild to moderate NPDR, so misclassified earlier DR cases represent 85%. Other cases of false positives were AMD, myopathy, retinal vein occlusion, and retinal detachment.”

The most common cause of false negatives were intraretinal microvascular abnormalities (IrMA) and often quite subtle lesions, Keel said, “which really suggests that future optimization of our system with larger datasets with this particular lesion may in fact increase our sensitivity metric further.” Other cases of false negatives were missed PRP laser scars, missed preretinal hemorrhage, and also, some questionable vessels on poorer quality images.

Future work will focus on assessing the real world impact of the DLA screening model in terms of adherence to referral, new disease detection rates and cost effectiveness compared to other telemedicine models and usual care.

Researchers also noted the challenges inherent in convincing clinicians to adopt new technology. It will require “a major mind shift in how clinicians entrust their clinical care to machines,” Keel said.

However, that could be facilitated if results are strong enough. There are certainly target points for further optimization of the DLA, he said, but “in 97% of cases, traditional disease regions were highlighted, providing more evidence of the ability of this DLA.”

Get frontline clinical insights directly to your inbox.

Related Coverage >>>

Several Gene Therapies on Horizon for Retinitis Pigmentosa

Outcomes Worse with anti-VEGF than PRP in Patients Lost to Follow-Up

Artificial Intelligence Effectively Assesses Cell Therapy Functionality