2023
DOI: 10.1002/ohn.489
|View full text |Cite
|
Sign up to set email alerts
|

Accuracy of ChatGPT‐Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis

Luigi Angelo Vaira,
Jerome R. Lechien,
Vincenzo Abbate
et al.

Abstract: ObjectiveTo investigate the accuracy of Chat‐Based Generative Pre‐trained Transformer (ChatGPT) in answering questions and solving clinical scenarios of head and neck surgery.Study DesignObservational and valuative study.SettingEighteen surgeons from 14 Italian head and neck surgery units.MethodsA total of 144 clinical questions encompassing different subspecialities of head and neck surgery and 15 comprehensive clinical scenarios were developed. Questions and scenarios were inputted into ChatGPT4, and the res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
30
1

Year Published

2024
2024
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 50 publications
(32 citation statements)
references
References 36 publications
1
30
1
Order By: Relevance
“…The authors were primarily affiliated with institutions in the United States (n=47 of 122 different countries identified per publication, 38.5%), followed by Germany (n=11/122, 9%), Turkey (n=7/122, 5.7%), the United Kingdom (n=6/122, 4.9%), China/Australia/Italy (n=5/122, 4.1%, respectively), and 24 (n=36/122, 29.5%) other countries. Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,2629,3134,3640,4249,5254,5661,63,6567,71,72,74,75,77,78,8189,91,92,94,95,97100,102–104,106109,111 , followed by GPT-4 (n=33/124, 26.6%) 13,25,27,29,30,3436,41,43,50,51,54,55,58,61,64,6870,74,76,7981,83,87,89,90,93,96,98,99,101,105 , Bard (n=10/124, 8.1%; now known as Gemini) 33,48,49,55,73,74,80,87,94,99 , Bing Chat (n=7/124, 5.7%; now Microsoft Copilot) 49,51,55,73,94,99,110 , and other applications based on Bidirectional Encoder Representations from Transformers (BERT; n=4/124, 3...…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…The authors were primarily affiliated with institutions in the United States (n=47 of 122 different countries identified per publication, 38.5%), followed by Germany (n=11/122, 9%), Turkey (n=7/122, 5.7%), the United Kingdom (n=6/122, 4.9%), China/Australia/Italy (n=5/122, 4.1%, respectively), and 24 (n=36/122, 29.5%) other countries. Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,2629,3134,3640,4249,5254,5661,63,6567,71,72,74,75,77,78,8189,91,92,94,95,97100,102–104,106109,111 , followed by GPT-4 (n=33/124, 26.6%) 13,25,27,29,30,3436,41,43,50,51,54,55,58,61,64,6870,74,76,7981,83,87,89,90,93,96,98,99,101,105 , Bard (n=10/124, 8.1%; now known as Gemini) 33,48,49,55,73,74,80,87,94,99 , Bing Chat (n=7/124, 5.7%; now Microsoft Copilot) 49,51,55,73,94,99,110 , and other applications based on Bidirectional Encoder Representations from Transformers (BERT; n=4/124, 3...…”
Section: Resultsmentioning
confidence: 99%
“…Most studies examined one or more applications based on the GPT-3.5 architecture (n=66 of 124 different LLMs examined per study, 53.2%) 13,2629,3134,3640,4249,5254,5661,63,6567,71,72,74,75,77,78,8189,91,92,94,95,97100,102–104,106109,111 , followed by GPT-4 (n=33/124, 26.6%) 13,25,27,29,30,3436,41,43,50,51,54,55,58,61,64,6870,74,76,7981,83,87,89,90,93,96,98,99,101,105 , Bard (n=10/124, 8.1%; now known as Gemini) 33,48,49,55,73,74,80,87,94,99 , Bing Chat (n=7/124, 5.7%; now Microsoft Copilot) 49,51,55,73,94,99,110 , and other applications based on Bidirectional Encoder Representations from Transformers (BERT; n=4/124, 3.2%) 13,83,84 , Large Language Model Meta-AI (LLaMA; n=3/124, 2.4%) 55 , or Claude by Anthropic (n=1/124, 0.8%) 55 . The majority of applications were p...…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…These tools have primarily been designed for manual, human-centric assessments and are not compatible with AI-generated outputs. To date, no validated tool exists to accurately assess the health information provided by ChatGPT, and the few clinical studies published on this topic have used non-validated instruments [9,10,17,[24][25][26][27].…”
Section: Introductionmentioning
confidence: 99%