Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis

Ghanem, Yazid K.; Rouhi, Armaun D.; Al-Houssan, Ammr; Saleh, Zena; Moccia, Matthew C.; Joshi, Hansa; Dumon, Kristoffel R.; Hong, Young; Spitz, Francis; Joshi, Amit R.; Kwiatt, Michael

doi:10.1007/s00464-024-10739-5

Cited by 15 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a study on ChatGPT-4’s efficacy in providing information about periodontal diseases to patients, the responses were rated as ‘good’ based on total DISCERN scores [ 32 ]. Similarly, in a study assessing the quality of AI-generated medical information on appendicitis, ChatGPT-4 and Bard received DISCERN scores of 62.0 and 62.3, respectively, categorized as having “good” accuracy [ 33 ]. Our findings are consistent with these studies, as both BingAI and Gemini were rated as “good” while ChatGPT-4’s responses achieved “excellent” accuracy according to the DISCERN scale.…”

Section: Discussionmentioning

confidence: 99%

“…The first part of the tool evaluates the resource's reliability through eight questions; the second part assesses the specifics of the treatment for the disease through seven questions, and the final question evaluates the overall quality. The total score, excluding the last question, ranges from 16 to 75 and is used to categorize the quality into five levels: excellent (63-75), good (51-62), moderate (39-50), poor (27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38), or very poor (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26). The EQIP tool, developed by health professionals and patient information managers, provides a comprehensive framework for assessing the quality of health information resources, such as websites and patient leaflets [20].…”

mentioning

confidence: 99%

See 1 more Smart Citation

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity

Durmaz Engin,

Karatas,

Ozturk

2024

Children

View full text Add to dashboard Cite

Background: Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP). Methods: The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists. The models’ responses were also evaluated for reliability with the DISCERN instrument and the EQIP framework, and for readability using the Flesch Reading Ease (FRE), Flesch-Kincaid Grade Level (FKGL), and Coleman-Liau Index. Results: ChatGPT-4 outperformed BingAI and Gemini, scoring the highest with 5 points in 90% (45 out of 50) and achieving ratings of “agreed” or “strongly agreed” in 98% (49 out of 50) of responses. It led in accuracy and reliability with DISCERN and EQIP scores of 63 and 72.2, respectively. BingAI followed with scores of 53 and 61.1, while Gemini was noted for the best readability (FRE score of 39.1) but lower reliability scores. Statistically significant performance differences were observed particularly in the screening, diagnosis, and treatment categories. Conclusion: ChatGPT-4 excelled in providing detailed and reliable responses to ROP-related queries, although its texts were more complex. All models delivered generally accurate information as per DISCERN and EQIP assessments.

show abstract

Section: Discussionmentioning

confidence: 99%

mentioning

confidence: 99%

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity

Durmaz Engin,

Karatas,

Ozturk

2024

Children

View full text Add to dashboard Cite

show abstract

Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model

Swisher,

Wu,

Liu

et al. 2024

Otolaryngol.--head neck surg.

View full text Add to dashboard Cite

ObjectiveTo use an artificial intelligence (AI)‐powered large language model (LLM) to improve readability of patient handouts.Study DesignReview of online material modified by AI.SettingAcademic center.MethodsFive handout materials obtained from the American Rhinologic Society (ARS) and the American Academy of Facial Plastic and Reconstructive Surgery websites were assessed using validated readability metrics. The handouts were inputted into OpenAI's ChatGPT‐4 after prompting: “Rewrite the following at a 6th‐grade reading level.” The understandability and actionability of both native and LLM‐revised versions were evaluated using the Patient Education Materials Assessment Tool (PEMAT). Results were compared using Wilcoxon rank‐sum tests.ResultsThe mean readability scores of the standard (ARS, American Academy of Facial Plastic and Reconstructive Surgery) materials corresponded to “difficult,” with reading categories ranging between high school and university grade levels. Conversely, the LLM‐revised handouts had an average seventh‐grade reading level. LLM‐revised handouts had better readability in nearly all metrics tested: Flesch‐Kincaid Reading Ease (70.8 vs 43.9; P < .05), Gunning Fog Score (10.2 vs 14.42; P < .05), Simple Measure of Gobbledygook (9.9 vs 13.1; P < .05), Coleman‐Liau (8.8 vs 12.6; P < .05), and Automated Readability Index (8.2 vs 10.7; P = .06). PEMAT scores were significantly higher in the LLM‐revised handouts for understandability (91 vs 74%; P < .05) with similar actionability (42 vs 34%; P = .15) when compared to the standard materials.ConclusionPatient‐facing handouts can be augmented by ChatGPT with simple prompting to tailor information with improved readability. This study demonstrates the utility of LLMs to aid in rewriting patient handouts and may serve as a tool to help optimize education materials.Level of EvidenceLevel VI.

show abstract

Comparative Analysis of ChatGPT and Google Gemini in the Creation of Patient Education Materials for Acute Appendicitis, Cholecystitis, and Hydrocele

Joseph,

Sanghavi,

Kanyal

et al. 2024

Indian J Surg

View full text Add to dashboard Cite

Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis

Cited by 15 publications

References 27 publications

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity

Exploring the Role of ChatGPT-4, BingAI, and Gemini as Virtual Consultants to Educate Families about Retinopathy of Prematurity

Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT's Large Language Model

Comparative Analysis of ChatGPT and Google Gemini in the Creation of Patient Education Materials for Acute Appendicitis, Cholecystitis, and Hydrocele

Contact Info

Product

Resources

About