Study Finds AI Chatbot Provides Inaccurate Information on Vitreoretinal Diseases

Author(s):

Large learning model-based platforms provide largely inaccurate responses to questions concerning vitreoretinal disease and remain inconsistent on repeat queries.

Peter Y. Zhao, MD

Credit: LinkedIn

New research suggests current large language model (LLM)-based platforms provide largely inaccurate responses to questions concerning vitreoretinal disease and show inconsistencies in a repeat query.¹

The analysis showed 50.0% of answers were materially different despite no functional changes made to the platform between the first and second questions submissions, indicating a lack of consistency between generated information.

“A greater degree of subspecialization in the field of vitreoretinal disease might explain the differences in accuracy,” wrote the investigative team, led by Peter Y. Zhao, MD, New England Eye Center, Tufts Medical Center. “Hallucination generating factually inaccurate response is a known issue with LLM-based platforms but has the potential to cause patient harm in the domain of medical knowledge.”

Patients often require accurate information on ophthalmic conditions to make informed medical decisions, but information on the internet often comes from unregulated or unverified sources, decreasing reliability. Increasing in popularity, artificial intelligence (AI)-based language platforms respond to user inquiries by generating paragraph-length responses.

This cross-sectional analysis evaluated the accuracy and reproducibility of a single chatbot’s responses to commonly asked patient questions about vitreoretinal disease. Investigators collected frequently asked questions from the internet on various vitreoretinal conditions and procedures, including:

Macular degeneration
Diabetic retinopathy
Retinal vein occlusion
Retinal tear or detachment
Posterior vitreous detachment
Vitreous hemorrhage
Epiretinal membrane
Macular hole
Central serous chorioretinopathy
Retina laser
Retinal surgery
Intravitreal injection

All questions were posed to the AI chatbot ChatGPT in January 2023. Responses were evaluated by 2 fellowship-trained vitreoretinal surgeons and graded as accurate if the entirety of the response was considered appropriate. To determine whether answers could change over time, investigators resubmitted questions to the same platform 14 days after the initial inquiry and compared the responses.

Upon analysis, only 8 (15.4%) of the 52 questions submitted initially were graded as completely accurate. After the resubmission of questions, all 52 responses were found to have changed, with 26 responses (50.0%) materially changing.

For 16 of these responses (30.8%), the accuracy materially improved, while for 10 responses (19.2%), the accuracy materially worsened. Investigators noted some responses contained inappropriate or potentially harmful medical advice.

In response to “How do you get rid of epiretinal membrane?”, the chatbot described vitrectomy but also included incorrect options of injection therapy and laser therapy. Then, in response to “What are the treatment options for central serous chorioretinopathy?” the platform included an incorrect statement on corticosteroids being used to reduce inflammation and fluid accumulation in the retina.

In fact, investigators noted corticosteroid therapy can exacerbate central serous chorioretinopathy.

Limitations of the chatbot include its structure designed for research and not for medical use. Investigators noted the evaluated chatbot provided largely accurate responses in the field of preventive cardiovascular disease.²

However, as the chatbot is being continually updated and revised, investigators suggest these findings could be different for vitreoretinal disease in a future investigation.

“Overall, LLB-based platforms could be used by patients to obtain medical advice,” investigators wrote. “Ophthalmologists need to be aware of the limitations and potential for dissemination of misinformation associated with these AI platforms.”

References

Caranfa JT, Bommakanti NK, Young BK, Zhao PY. Accuracy of Vitreoretinal Disease Information From an Artificial Intelligence Chatbot. JAMA Ophthalmol. Published online August 03, 2023. doi:10.1001/jamaophthalmol.2023.3314
Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA. 2023;329(10):842-844. doi:10.1001/jama.2023.1044