2023
DOI: 10.4174/astr.2023.104.5.269
|View full text |Cite
|
Sign up to set email alerts
|

ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models

Abstract: Purpose This study aimed to assess the performance of ChatGPT, specifically the GPT-3.5 and GPT-4 models, in understanding complex surgical clinical information and its potential implications for surgical education and training. Methods The dataset comprised 280 questions from the Korean general surgery board exams conducted between 2020 and 2022. Both GPT-3.5 and GPT-4 models were evaluated, and their performances were compared using McNemar test. Result… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
39
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 112 publications
(41 citation statements)
references
References 10 publications
2
39
0
Order By: Relevance
“…Recent studies show GPT-4 outperformed GPT-3.5 by 24%-30% in various medical examinations. 13,14,21,23 These findings indicate a significant enhancement in the model's capabilities. However, a study using the American College of Gastroenterology Test found GPT-3.5 and GPT-4 had scores of 65% and 62%, respectively.…”
Section: Discussionmentioning
confidence: 73%
See 1 more Smart Citation
“…Recent studies show GPT-4 outperformed GPT-3.5 by 24%-30% in various medical examinations. 13,14,21,23 These findings indicate a significant enhancement in the model's capabilities. However, a study using the American College of Gastroenterology Test found GPT-3.5 and GPT-4 had scores of 65% and 62%, respectively.…”
Section: Discussionmentioning
confidence: 73%
“…As a result, questions that involved visual elements, such as clinical images, medical photographs, and graphs, were excluded from our assessment, following the approach taken by previous studies. 7,8,11,13,14…”
Section: Methodsmentioning
confidence: 99%
“…These comparisons highlighted the potential of ChatGPT in higher educational assessments; nevertheless, it showed the importance of ongoing refinements of these models and the dangers of inaccuracies it poses (Lo, 2023;Sallam, 2023;Sallam et al, 2023d;Gill et al, 2024). However, making direct comparisons across variable studies can be challenging due to differences in models implemented, subject fields of the exams, test dates, and the exact approaches of prompt construction (Holmes et al, 2023;Huynh Linda et al, 2023;Meskó, 2023;Oh et al, 2023;Skalidis et al, 2023;Yaa et al, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…Questions, along with their multiple-choice answers, were presented to the model followed by the instruction, 'Give the number of the best answer. Start your response with "The answer is:"' The goal of this approach was to have the LLM respond with just the multiple-choice answer (1)(2)(3)(4)(5) and not provide a lengthy (costly) explanation.…”
Section: Ai Prompting Methodologymentioning
confidence: 99%
“…GPT 4.0 has proven remarkable ability in assessing knowledge in specialised domains such as medicine, law, and business [2][3][4]-areas that have historically been the exclusive purview of professionals. Particularly noteworthy is its exceptional performance on assessments like the Korean general surgery board exam, the United States Medical Licensing Exam, and the Wharton MBA final exam, each achieved without the finetuning of the pretrained model [5][6][7].…”
Section: Introductionmentioning
confidence: 99%