Study Compares ChatGPT-4 Diagnostic Abilities to Teledermatologists

Author(s):

Key Takeaways

ChatGPT-4 achieved a Top 1 diagnostic concordance in 70.8% of teledermatology cases, with a Top 3 concordance in 87.7%.
The AI model outperformed teledermatologists in image description accuracy across five parameters: color, location, morphology, size, and surrounding area.
Despite promising results, further research and ethical considerations, especially patient privacy, are necessary as AI integrates into clinical practice.

In this analysis, the investigators sought to assess the ability of a ChatGPT-4-based chatbot in its performance of key functions in a teledermatology platform.

Study Compares ChatGPT-4 Diagnostic Abilities to Teledermatologists

Credit: Pexels

The artificial intelligence platform ChatGPT-4 demonstrates potential in its generation of differential diagnoses and accurate descriptions of images in dermatology, according to recent findings, suggesting the technology’s potential in the field.¹

These results were the conclusion of new research comparing ChatGPT-4 to human teledermatologists, looking at retrospective results of teledermatology consultations and using the AI model for the same purpose. This analysis was led by Jonathan Shapiro, MD, from Maccabi Healthcare Services in Israel.

Despite recent advancements in the latest model of ChatGPT, a recent analysis looked at the results of the platform after an evaluation of clinical images from an online dermatology resource.² The AI model correctly diagnosed only 23% of the cases, suggesting the need for additional research into its use in dermatological contexts.

“Our study aimed to explore the ability of a ChatGPT-4-based chatbot to perform key functions in a [teledermatology] platform, including integrating metadata with patient-submitted clinical images, describing images, and compiling a differential diagnosis, while comparing its performance to that of human [teledermatologists],” Shapiro and colleagues wrote.¹

Trial Design Details

The investigators looked at patient data that had been drawn from Maccabi Health Services (MHS), Israel's second-largest public healthcare provider. A new web-based service had been launched in July 2019 as an asynchronous teledermatology service, and it was designed for non-emergency cases.

This platform, which has since been upgraded for efficiency, gave patients the opportunity to submit photographs and detailed descriptions of their dermatologic conditions. The metadata that was gathered from teledermatology consultations included data such as the patient gender, patient age, location and symptoms of a patient’s skin condition, a condition’s duration, and any additional details provided by the subject using free text.

MHS protocols required that the teledermatologists also using the platform would record the image descriptions, metadata, differential diagnoses, and plans for treatment in the patient’s electronic medical records. They would then provide recommendations for in-person follow-ups for additional assessments when deemed as necessary.

In the new analysis by Shapiro et al., the research team sought out to evaluate ChatGPT-4’s diagnostic success and then compare it to that of teledermatologists. This would be done through an assessment of routine teledermatology consultations and then an assessment of corresponding ChatGPT-4 evaluations of the same clinical case data.

All of the consultations with teledermatologists were carried out by board-certified dermatologists who were known to have a minimum of 2 years of post-residency experience. There were 154 teledermatology cases from December 2023 - February 2024 that were compared with ChatGPT-4’s evaluation performance.

The investigators categorized diagnostic concordance between ChatGPT-4 and teledermatologists into 3 distinct levels: "Top 1," which would be an exact match with the teledermatologist’s diagnosis, "Top 3,” the correct diagnosis listed within the top 3 options, and "Partial,” which would be a diagnosis shown to be similar but not identical.

The research team also assessed image descriptions that were generated by ChatGPT-4 and provided by teledermatologists for their level of quality using a set of 5 parameters: color, location, morphology, size, and surrounding area. The accuracy of these descriptions was then also rated as Yes, No, or Partial.

Major Findings

The researchers concluded that, among 154 cases reviewed, ChatGPT-4 was able to successfully achieve a Top 1 diagnostic concordance in 70.8% of the cases used in the analysis. There was also a Top 3 concordance in 87.7% of cases and partial concordance in 2.6%, with the team noting that 9.7% had discordant diagnoses.

In their evaluation of image descriptions, the investigators reported that ChatGPT-4 significantly outperformed teledermatologists across all 5 of the aforementioned parameters. They noted that the AI model’s descriptions were found to be fully accurate in 84.4% of cases, partially in 14.3%, and inaccurate in only 2 cases.

Overall, the research team’s findings underscore the potential for AI and machine learning tools like ChatGPT-4 to complement teledermatology through the provision of accurate descriptions and diagnoses. Additional research may be warranted, however.

“This study suggests ChatGPT-4 may assist in asynchronous [teledermatology] tasks including analyzing metadata, describing clinical images, and delivering differential diagnoses,” they wrote. “However, given the limited scope, these findings are preliminary and require further research. Ethical considerations, particularly patient privacy, should be addressed in future studies as AI becomes more integrated into clinical practice.”¹

References

^{Shapiro J, Avitan-Hersh E, Greenfield B, Khamaysi Z, Dodiuk-Gad RP, Valdman-Grinshpoun Y, Freud T, Lyakhovitsky A. The use of a ChatGPT-4-based chatbot in teledermatology: A retrospective exploratory study. J Dtsch Dermatol Ges. 2025 Jan 12. doi: 10.1111/ddg.15609. Epub ahead of print. PMID: 39801186.}
^{Mackenzie EM, Sanabria B, Tchack M, et al. Investigating the diagnostic accuracy of GPT-4's novel image analytics feature in dermatology. J Eur Acad Dermatol Venereol. 2024; 38(11): e954-e956.}