News
Article
Author(s):
ChatGPT effectively simplified information about more than 60 glomerular disease terms to enhance readability, but medical accuracy decreased.
ChatGPT may be an effective tool for simplifying information about glomerular diseases to enhance the readability of educational materials for patients, according to findings from a recent study.1
Results showed that ChatGPT successfully simplified the explanations of more than 60 terms related to glomerular disease to improve readability for patients. However, as simplification and subsequent readability increased, the medical accuracy of the information slightly decreased.1
“Communicating medical information between healthcare providers and patients presents a significant challenge due to varying levels of health literacy,” Jing Miao, MD, PhD, a nephrologist at Mayo Clinic, and colleagues wrote.1 “Simplifying medical information, especially for complex conditions like glomerular disease, to an average reading level can greatly benefit individuals with limited literacy skills. This strategy helps reduce health disparities and promotes greater equity in healthcare, ensuring that all patients can understand and actively engage with their health information.”
A large language model developed by OpenAI, ChatGPT uses deep learning techniques to produce human-like responses to natural language inputs across a broad spectrum of prompts. While its use in healthcare and medical domains is still being explored and refined, some experts believe it will be a promising tool for patients and healthcare professionals alike.2
To determine whether ChatGPT can be used to accurately enhance patient comprehension of glomerular disease terms, investigators employed 2 queries evaluating the performance of the advanced ChatGPT model, GPT-4, for interpreting 67 terms representing a range of etiological categories, including glomerular diseases associated with nephrotic syndrome; glomerular diseases associated with nephritis; glomerular diseases associated with complement disorders; paraprotein-mediated glomerular diseases; hereditary glomerular disorders; and other miscellaneous conditions.1
The first GPT-4 query aimed at a general explanation while the other tailored responses for patients with an education level of 8th grade or lower. In order to ensure the responses remained unbiased and independent of previous interactions, investigators initiated a new chat session for each glomerular disease case.1
The accuracy of the responses generated by GPT-4 was evaluated by 2 investigators on a scale of 1-5, with 1 being completely incorrect and 5 being correct and comprehensive. The average score from the 2 investigators was used for each response.1
The readability of the responses generated by GPT-4 was assessed using the Consensus Reading Grade (CRG) Level, which incorporates 7 readability indices including the Flesch–Kincaid Grade (FKG) and Simple Measure of Gobbledygook (SMOG) indices. Additionally, the Flesch Reading Ease (FRE) score was also used to evaluate readability.1
On average, the general explanations received an accuracy score of 4.74 ± 0.31, indicating the responses were nearly all correct and comprehensive. However, the tailored explanations for at or below an 8th-grade reading level received a lower accuracy score (4.23 ± 0.35; P <.0001).1
The CRG level of the general explanations averaged 14.09 ± 0.98. In contrast, the CRG level of the tailored explanations averaged 9.69 ± 0.76 (P <.0001). Specifically, the FKG level for the general explanations averaged 13.85 ± 1.19, while the tailored explanations averaged 8.72 ± 0.84, indicating an 8th-grade reading level (P <.0001).1
Investigators noted the SMOG index for the general explanations was 11.76 ± 0.93, but the tailored explanations had a significantly lower SMOG index of 7.25 ± 0.84 (P <.0001). Similarly, they pointed out the FRE score for the general explanations averaged 31.63 ± 6.97, indicating that the text is difficult to read, whereas the tailored explanations had a significantly higher FRE score of 63.52 ± 5.32 (P <.0001), suggesting the text is easier to read and understand.1
Of note, all of the effect size Cohen’s d values exceeded 0.8, highlighting substantial differences in readability and accuracy between the general and tailored explanation groups. Additionally, investigators observed a moderate negative correlation between the accuracy and readability scores (r = −0.417; P <.001), suggesting that as readability improved, accuracy decreased.1
They outlined multiple limitations to these findings, including the inability to fully capture the complexities of patient understanding and readability of medical educational materials; the lack of a human control population to evaluate the comprehensibility of the responses for the target group; and the potential lack of generalizability to populations with different literacy skills.1
“This study demonstrated that ChatGPT can effectively enhance the readability of educational materials for patients with glomerular diseases, though it also highlights the complex trade-off between simplifying content and maintaining medical accuracy,” investigators concluded.1
References