, new research suggests.
“This study shows that a conversational AI program can generate credible medical information in response to common patient questions,” say the investigators, led by Tsung-Chun Lee, MD, division of gastroenterology and hepatology, Taipei Medical University Shuang Ho Hospital, New Taipei City, Taiwan.
“With dedicated domain training, there is meaningful potential to optimize clinical communication to patients undergoing colonoscopy,” they add.
The study was published online in Gastroenterology.
ChatGPT, developed by OpenAI, is a natural language processing tool that allows users to have personalized conversations with an artificial intelligence (AI) bot capable of providing a detailed response to any question posed.
For their first-of-its-kind study, Dr. Lee and colleagues assessed the quality of ChatGPT-generated answers to eight common patient questions about colonoscopy, including what a colonoscopy entails, why it’s performed, how to prepare for it, potential complications, what to expect after the procedure, and what happens with a positive/negative result.
They retrieved the questions from the websites of three randomly selected top hospitals for gastroenterology and gastrointestinal surgery and had ChatGPT (Jan. 30, 2023, version) answer the questions twice.
Using plagiarism detection software, they found that text similarity was extremely low between ChatGPT answers and those on hospital websites (0%-16%). Text similarity ranged from 28% to 77% between the two ChatGPT answers for the same question, except on the question of what to do after a positive colonoscopy result, which had 0% text similarity.
To objectively gauge the quality of the ChatGPT answers, four gastroenterologists (two senior gastroenterologists and two fellows) rated 36 pairs of common questions and answers on a seven-point Likert scale according to ease of understanding, scientific adequacy, and satisfaction with the answer.
The gastroenterologists rated the ChatGPT answers highly and similarly to non-AI answers for all three quality indicators, with some AI scores even higher than non-AI scores.
Interestingly, they could correctly identify AI-generated answers only 48% of the time. Three raters had an accuracy of less than 50%, whereas one (a fellow) was 81% accurate.
The researchers note that publications about ChatGPT in PubMed grew 10-fold from Feb. 3 to April 14, 2023, with topics such as board examinations authorship, editorial policies, medical education, and clinical decision support.
Although in their early days, ChatGPT and other AI bots may represent a “transformative innovation” in how medical information is created by physicians and consumed by patients, they say.
It could also be a time-saver for health care professionals.
“AI-generated medical information, with appropriate provider oversight, accreditation, and periodic surveillance, could improve efficiency of care and free providers for more cognitively intensive patient communications,” they add.
However, several challenges remain, such as the lack of clinical evidence in constructing AI-generated answers.
In addition, AI-generated answers were written at significantly higher reading levels than were answers on hospital websites, which could be a barrier for some patients.
The study received no specific funding. The authors have declared no relevant conflicts of interest.
A version of this article first appeared on Medscape.com.