News
Article
Author(s):
In this analysis, the investigators sought to assess the ability of a ChatGPT-4-based chatbot in its performance of key functions in a teledermatology platform.
The artificial intelligence platform ChatGPT-4 demonstrates potential in its generation of differential diagnoses and accurate descriptions of images in dermatology, according to recent findings, suggesting the technology’s potential in the field.1
These results were the conclusion of new research comparing ChatGPT-4 to human teledermatologists, looking at retrospective results of teledermatology consultations and using the AI model for the same purpose. This analysis was led by Jonathan Shapiro, MD, from Maccabi Healthcare Services in Israel.
Despite recent advancements in the latest model of ChatGPT, a recent analysis looked at the results of the platform after an evaluation of clinical images from an online dermatology resource.2 The AI model correctly diagnosed only 23% of the cases, suggesting the need for additional research into its use in dermatological contexts.
“Our study aimed to explore the ability of a ChatGPT-4-based chatbot to perform key functions in a [teledermatology] platform, including integrating metadata with patient-submitted clinical images, describing images, and compiling a differential diagnosis, while comparing its performance to that of human [teledermatologists],” Shapiro and colleagues wrote.1
The investigators looked at patient data that had been drawn from Maccabi Health Services (MHS), Israel's second-largest public healthcare provider. A new web-based service had been launched in July 2019 as an asynchronous teledermatology service, and it was designed for non-emergency cases.
This platform, which has since been upgraded for efficiency, gave patients the opportunity to submit photographs and detailed descriptions of their dermatologic conditions. The metadata that was gathered from teledermatology consultations included data such as the patient gender, patient age, location and symptoms of a patient’s skin condition, a condition’s duration, and any additional details provided by the subject using free text.
MHS protocols required that the teledermatologists also using the platform would record the image descriptions, metadata, differential diagnoses, and plans for treatment in the patient’s electronic medical records. They would then provide recommendations for in-person follow-ups for additional assessments when deemed as necessary.
In the new analysis by Shapiro et al., the research team sought out to evaluate ChatGPT-4’s diagnostic success and then compare it to that of teledermatologists. This would be done through an assessment of routine teledermatology consultations and then an assessment of corresponding ChatGPT-4 evaluations of the same clinical case data.
All of the consultations with teledermatologists were carried out by board-certified dermatologists who were known to have a minimum of 2 years of post-residency experience. There were 154 teledermatology cases from December 2023 - February 2024 that were compared with ChatGPT-4’s evaluation performance.
The investigators categorized diagnostic concordance between ChatGPT-4 and teledermatologists into 3 distinct levels: "Top 1," which would be an exact match with the teledermatologist’s diagnosis, "Top 3,” the correct diagnosis listed within the top 3 options, and "Partial,” which would be a diagnosis shown to be similar but not identical.
The research team also assessed image descriptions that were generated by ChatGPT-4 and provided by teledermatologists for their level of quality using a set of 5 parameters: color, location, morphology, size, and surrounding area. The accuracy of these descriptions was then also rated as Yes, No, or Partial.
The researchers concluded that, among 154 cases reviewed, ChatGPT-4 was able to successfully achieve a Top 1 diagnostic concordance in 70.8% of the cases used in the analysis. There was also a Top 3 concordance in 87.7% of cases and partial concordance in 2.6%, with the team noting that 9.7% had discordant diagnoses.
In their evaluation of image descriptions, the investigators reported that ChatGPT-4 significantly outperformed teledermatologists across all 5 of the aforementioned parameters. They noted that the AI model’s descriptions were found to be fully accurate in 84.4% of cases, partially in 14.3%, and inaccurate in only 2 cases.
Overall, the research team’s findings underscore the potential for AI and machine learning tools like ChatGPT-4 to complement teledermatology through the provision of accurate descriptions and diagnoses. Additional research may be warranted, however.
“This study suggests ChatGPT-4 may assist in asynchronous [teledermatology] tasks including analyzing metadata, describing clinical images, and delivering differential diagnoses,” they wrote. “However, given the limited scope, these findings are preliminary and require further research. Ethical considerations, particularly patient privacy, should be addressed in future studies as AI becomes more integrated into clinical practice.”1
References