Alignment of ChatGPT Responses With AAD Guidelines for Cutaneous Melanoma
PRACTICE POINTS
- ChatGPT provides structured, educational-style responses with broad contextual detail but may omit key clinical nuances such as specific surgical considerations, including staged excision or Mohs micrographic surgery for melanoma in situ.
- Large language models should be viewed as a tool to supplement expert clinical judgment and established guidelines rather than as a standalone replacement for dermatologic decision-making.
To the Editor:
ChatGPT (OpenAI), a popular large language model that generates responses to user queries, has attracted substantial attention as a potential resource for patient education.1 While prior studies have shown that ChatGPT can provide reliable and general patient information, its alignment with the American Academy of Dermatology’s (AAD’s) guidelines for primary cutaneous melanoma (CM) compared to evidence in the recent literature has not been evaluated.2,3 In this study, we compared ChatGPT’s responses to the 25 evidence-based questions utilized by the AAD to establish its 2019 recommendations for primary CM. Because the 2019 AAD guidelines included literature only through April 2017, we conducted an additional search (May 2017–February 2024) to assess ChatGPT’s alignment with more recent evidence not captured in the guidelines.
On April 17, 2024, 2 authors (D.P. and A.F.) prompted ChatGPT with 25 evidence-based questions from the 2019 AAD guidelines for the management of primary CM.4 ChatGPT’s responses were compared with the AAD’s published recommendations and were cross-referenced with responses gathered from our own search of PubMed articles indexed for MEDLINE using the phrase melanoma (cutaneous) and treatment, which included studies from May 2017 to February 2024.
ChatGPT’s answers to 23 of the questions aligned with the AAD’s guidelines (Table 1); in instances when the guidelines were inconclusive regarding pathology, the model provided recommendations supported by our contemporary PubMed literature search. Of the 3 questions related to CM pathology, the AAD guidelines had sufficient evidence to provide recommendations for 2 questions. The first question evaluated the clinical information necessary to help the pathologist improve diagnosis (Table 2). ChatGPT’s response to one question about staged excision and Mohs micrographic surgery for melanoma in situ did not align with the AAD guidelines (Table 3).



Our results showed that ChatGPT provided comprehensive responses aligned with current evidence on CM treatment, except for one surgery question for which its response differed from the AAD guidelines. Our findings are consistent with an observational study that reported board-certified dermatologists rated ChatGPT’s responses on melanoma-related questions as 4.88 on a scale of 1 to 5 (1 indicated completely inaccurate information, 5 indicated complete accuracy for clinical sufficiency in practice). The authors also found that ChatGPT gave vague advice, such as to “get regular skin exams,” which is less specific than dermatologists’ recommendations for annual, biannual, or more frequent examinations.5 ChatGPT’s limitations in offering comprehensive answers for some questions aligned with our findings, specifically the omission of key information in the surgical-related question, highlighting the challenge of relying on AI for nuanced clinical guidance.
We found that ChatGPT considered immunosuppression an important risk factor for CM. Similarly, a 2023 cohort study of 93 patients with melanoma and a history of immunosuppression reported that these patients had a higher risk for CM compared with a control group from the National Cancer Institute’s Surveillance, Epidemiology and End Results Program (standardized incidence ratio, 1.53; 95% CI, 1.12-2.04), indicating that incidence of CM in immunocompromised patients was 53% higher than an age- and sex-matched population cohort.6
Our findings also demonstrated that both ChatGPT’s responses and the AAD guidelines aligned in indicating that evidence linking pregnancy to an increased risk for CM remains inconclusive and that pregnant women should still undergo surveillance. A 2022 retrospective cohort study of 1406 women comparing pregnancy-associated melanoma to non–pregnancy-associated CM had no difference in overall survival (hazard ratio, 0.75; 95% CI, 0.54-1.05).7 However, tumor thickness (2.01-4.00 mm) was greater in postpartum cases compared with cases in nonpregnant women (odds ratio, 1.75; 95% CI, 1.03-2.98), suggesting that pregnancy may affect tumor characteristics.7 These findings underscore the importance of using AI tools such as ChatGPT as a supplement to—rather than as a replacement for—expert clinical judgment and up-to-date medical guidelines.