2023
DOI: 10.3171/2023.2.jns23419
|View full text |Cite
|
Sign up to set email alerts
|

ChatGPT versus the neurosurgical written boards: a comparative analysis of artificial intelligence/machine learning performance on neurosurgical board–style questions

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
12
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(14 citation statements)
references
References 1 publication
2
12
0
Order By: Relevance
“…When assessed on more than 2,000 questions using all parts of the CNS SANS question bank, ChatGPT achieved a fairly unimpressive overall accuracy of 50.4% (1,068/2,120). Our findings corroborate those of Hopkins et al, who found a similar accuracy of 54.9% (262/477) using non-imaging questions from another question bank [ 19 ]. In contrast, Ali et al reported a much higher accuracy of 73.4% (367/500) using both imaging and non-imaging questions from part one of the CNS SANS question bank [ 20 ].…”
Section: Discussionsupporting
confidence: 92%
“…When assessed on more than 2,000 questions using all parts of the CNS SANS question bank, ChatGPT achieved a fairly unimpressive overall accuracy of 50.4% (1,068/2,120). Our findings corroborate those of Hopkins et al, who found a similar accuracy of 54.9% (262/477) using non-imaging questions from another question bank [ 19 ]. In contrast, Ali et al reported a much higher accuracy of 73.4% (367/500) using both imaging and non-imaging questions from part one of the CNS SANS question bank [ 20 ].…”
Section: Discussionsupporting
confidence: 92%
“…Overall, the phenomenal improvement in the test-taking performance of ChatGPT 4 compared to ChatGPT 3.5 raises intriguing questions regarding future applications and implications of AI in medical education and diagnostics. AI has shown its prowess not only on the USMLE examinations in medical education but also on advanced examinations, such as the neurosurgical written boards [16]. This phenomenon ventures into other aspects of medicine as well, including research and clinical performance [17].…”
Section: Principal Findingsmentioning
confidence: 99%
“…Despite not being trained on a specific data set, ChatGPT performed at the level of a first‐year resident in plastic surgery on the in‐service training exam 7,8 . In neurosurgery, ChatGPT performed worse than the average user on Self‐Assessment Neurosurgery questions but better than residents in some topics 9 . Clearly, there is already some rudimentary capacity in providing specialty care.…”
Section: Discussionmentioning
confidence: 99%