2023
DOI: 10.21203/rs.3.rs-2566942/v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model

Abstract: Background: Natural language processing models such as ChatGPT can generate text-based content and are poised to become a major information source in medicine and beyond. The accuracy and completeness of ChatGPT for medical queries is not known. Methods: Thirty-three physicians across 17 specialties generated 284 medical questions that they subjectively classified as easy, medium, or hard with either binary (yes/no) or descriptive answers. The physicians then graded ChatGPT-generated answers to these question… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

5
198
3
2

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 262 publications
(208 citation statements)
references
References 10 publications
5
198
3
2
Order By: Relevance
“…Such artificial intelligence (AI) chatbots generate responses based on a large corpus of data including articles, websites, and other publicly available text data. 1 Artificial intelligence chatbots are rapidly becoming a primary source of information for patients, and chatbot responses to medical queries may influence health-related behavior. Among AI chatbots, ChatGPT (OpenAI) has shown promise in accurately answering medical questions, even US Medical Licensing Examination-style questions.…”
Section: Resultsmentioning
confidence: 99%
“…Such artificial intelligence (AI) chatbots generate responses based on a large corpus of data including articles, websites, and other publicly available text data. 1 Artificial intelligence chatbots are rapidly becoming a primary source of information for patients, and chatbot responses to medical queries may influence health-related behavior. Among AI chatbots, ChatGPT (OpenAI) has shown promise in accurately answering medical questions, even US Medical Licensing Examination-style questions.…”
Section: Resultsmentioning
confidence: 99%
“…They rated the output not “so incorrect as to cause patient harm” 91% to 93% of the time, but concordance with the results generated by a consultation service run by physicians and informatics experts analyzing aggregated electronic health record data was just 21% to 41%. Another study in which physicians evaluated ChatGPT 3.5 output on 180 clinical queries found that the mean score was 4.4 of 6 for accuracy and 2.4 for completeness, with 8% of answers scored as completely incorrect . In a third study, ChatGPT 3.5 responses to 36 clinical vignettes, compared with the clinical manual from which the vignettes were drawn, were scored as 72% accurate on average.…”
mentioning
confidence: 99%
“…Another study in which physicians evaluated ChatGPT 3.5 output on 180 clinical queries found that the mean score was 4.4 of 6 for accuracy and 2.4 for completeness, with 8% of answers scored as completely incorrect. 6 In a third study, ChatGPT 3.5 responses to 36 clinical vignettes, compared with the clinical manual from which the vignettes were drawn, were scored as 72% accurate on average. The researchers characterized this as "impressive accuracy," but acknowledged that even small errors can harm patients.…”
mentioning
confidence: 99%
“…The daunting challenge requires critical and insightful analysis of the data by expert minds to discern reliable data, which is not simply possible by text mining. In clinical medicine, the preliminary studies suggest that ChatGPT could generate reasonably but not perfectly accurate responses to medical questions, write patient clinic letters, or summarize discharge notes [2][3][4][5][6] . However, all ChatGPTgenerated medical documents must be reviewed and verified by experts.…”
mentioning
confidence: 99%