2024
DOI: 10.1007/s00464-024-10739-5
|View full text |Cite
|
Sign up to set email alerts
|

Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis

Yazid K. Ghanem,
Armaun D. Rouhi,
Ammr Al-Houssan
et al.

Abstract: Introduction Generative artificial intelligence (AI) chatbots have recently been posited as potential sources of online medical information for patients making medical decisions. Existing online patient-oriented medical information has repeatedly been shown to be of variable quality and difficult readability. Therefore, we sought to evaluate the content and quality of AI-generated medical information on acute appendicitis. Methods A modified DISCERN assess… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…In a study on ChatGPT-4’s efficacy in providing information about periodontal diseases to patients, the responses were rated as ‘good’ based on total DISCERN scores [ 32 ]. Similarly, in a study assessing the quality of AI-generated medical information on appendicitis, ChatGPT-4 and Bard received DISCERN scores of 62.0 and 62.3, respectively, categorized as having “good” accuracy [ 33 ]. Our findings are consistent with these studies, as both BingAI and Gemini were rated as “good” while ChatGPT-4’s responses achieved “excellent” accuracy according to the DISCERN scale.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In a study on ChatGPT-4’s efficacy in providing information about periodontal diseases to patients, the responses were rated as ‘good’ based on total DISCERN scores [ 32 ]. Similarly, in a study assessing the quality of AI-generated medical information on appendicitis, ChatGPT-4 and Bard received DISCERN scores of 62.0 and 62.3, respectively, categorized as having “good” accuracy [ 33 ]. Our findings are consistent with these studies, as both BingAI and Gemini were rated as “good” while ChatGPT-4’s responses achieved “excellent” accuracy according to the DISCERN scale.…”
Section: Discussionmentioning
confidence: 99%
“…The first part of the tool evaluates the resource's reliability through eight questions; the second part assesses the specifics of the treatment for the disease through seven questions, and the final question evaluates the overall quality. The total score, excluding the last question, ranges from 16 to 75 and is used to categorize the quality into five levels: excellent (63-75), good (51-62), moderate (39-50), poor (27)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38), or very poor (16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26). The EQIP tool, developed by health professionals and patient information managers, provides a comprehensive framework for assessing the quality of health information resources, such as websites and patient leaflets [20].…”
mentioning
confidence: 99%