2023
DOI: 10.1001/jamanetworkopen.2023.36483
|View full text |Cite
|
Sign up to set email alerts
|

Accuracy and Reliability of Chatbot Responses to Physician Questions

Rachel S. Goodman,
J. Randall Patrinely,
Cosby A. Stone
et al.

Abstract: ImportanceNatural language processing tools, such as ChatGPT (generative pretrained transformer, hereafter referred to as chatbot), have the potential to radically enhance the accessibility of medical information for health professionals and patients. Assessing the safety and efficacy of these tools in answering physician-generated questions is critical to determining their suitability in clinical settings, facilitating complex decision-making, and optimizing health care efficiency.ObjectiveTo assess the accur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
29
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 167 publications
(32 citation statements)
references
References 11 publications
2
29
1
Order By: Relevance
“…The influence of ChatGPT is attributed to its conversational prowess and its performance, which approaches or matches human-level competence in cognitive tasks, spanning various domains including medicine. 16 ChatGPT has achieved commendable results in the United States Medical Licensing Examinations, leading to discussions about the readiness of LLM applications for integration into clinical [17][18][19] , educational [20][21][22] , and research 23 environments.…”
Section: Introductionmentioning
confidence: 99%
“…The influence of ChatGPT is attributed to its conversational prowess and its performance, which approaches or matches human-level competence in cognitive tasks, spanning various domains including medicine. 16 ChatGPT has achieved commendable results in the United States Medical Licensing Examinations, leading to discussions about the readiness of LLM applications for integration into clinical [17][18][19] , educational [20][21][22] , and research 23 environments.…”
Section: Introductionmentioning
confidence: 99%
“…We compared answers to clinical questions and case management generated by GPT-4 and fellowship-trained retina and glaucoma specialists. We compared the accuracy and completeness of answers, evaluated using a Likert scale, which aligns with a validated approach . Secondary end points explored rating differences between trainees and attendings to assess whether the level of training influenced the perception of the LLM’s responses.…”
Section: Methodsmentioning
confidence: 99%
“…While these studies showcase the potential of LLM chatbots in specific domains, a broader evaluation of their accuracy, including in comparison with attending-level ophthalmologists, is warranted, particularly for addressing real-life clinical case scenarios . In this study, we compared an LLM chatbot’s responses with those of fellowship-trained glaucoma and retina specialists to explore the potential of LLMs in clinical ophthalmology.…”
Section: Introductionmentioning
confidence: 99%
“…The use of large language models such as GPT-4, which are “generative artificial intelligence” tools capable of creating natural, human-sounding prose in response to a plain-language query, and their incorporation into search engines, promise to make it easier for patients to directly ask questions related to preanesthetic preparation. The accuracy of large language models in answering medical questions has generally been impressive 1–3 but has not been evaluated for preanesthetic queries. We evaluated the ability of the widely accessible model GPT-4 to provide reasonable responses to common preanesthetic patient questions, compared to online published resources.…”
Section: To the Editormentioning
confidence: 99%
“…Two sessions with ChatGPT were used because the software regenerates new responses when reprompted, and the responses may differ in quality. 1 Survey participants were preoperative anesthesia experts known to the investigators, and other similar experts suggested by this cohort, nearly all of whom were academicians involved with preoperative assessment (total solicited N = 210). The survey instructions asked raters to “evaluate answers to questions about anesthesia care that patients may ask.…”
Section: To the Editormentioning
confidence: 99%