2023
DOI: 10.1038/s41598-023-43436-9
|View full text |Cite
|
Sign up to set email alerts
|

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments

Dana Brin,
Vera Sorin,
Akhil Vaid
et al.

Abstract: The United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

5
34
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 126 publications
(39 citation statements)
references
References 14 publications
5
34
0
Order By: Relevance
“…Previous evaluations exclude questions with images as the single-modality limitation of ChatGPT and GPT-4. 20,[38][39][40] Our findings revealed that while medical students' performance lineally decreased when the difficulty of questions increased, GPT-4V's performance stayed relatively stable. When hints were provided, GPT-4V's performance stayed almost the same among questions in all difficult levels, as shown in Figure 2.…”
Section: Discussionmentioning
confidence: 94%
See 1 more Smart Citation
“…Previous evaluations exclude questions with images as the single-modality limitation of ChatGPT and GPT-4. 20,[38][39][40] Our findings revealed that while medical students' performance lineally decreased when the difficulty of questions increased, GPT-4V's performance stayed relatively stable. When hints were provided, GPT-4V's performance stayed almost the same among questions in all difficult levels, as shown in Figure 2.…”
Section: Discussionmentioning
confidence: 94%
“…Previous evaluations exclude questions with images as the single-modality limitation of ChatGPT and GPT-4. 20,38–40…”
Section: Discussionmentioning
confidence: 99%
“…115 However, even when trained for general purposes, ChatGPT has previously been shown to pass the United States Medical Licensing Examination (USMLE), the German State Examination in Medicine, or even a radiology board-style examination without images. [116][117][118][119] Although outperformed on specific tasks by specialized medical LLMs, such as Google's MedPaLM-2, this suggests that general-purpose LLMs can comprehend complex medical literature and case scenarios to a degree that meets professional standards. 120 Furthermore, given the large amounts of data on which proprietary models such as ChatGPT are trained, it is not unlikely that they have been exposed to more medical data overall than smaller specialized models despite being generalist models.…”
Section: Discussionmentioning
confidence: 99%
“…We also need to examine our current pedagogy-MCQs alone may no longer be enough to evaluate student or trainee understanding. [25][26][27] As an introduction to AI in Medicine, I also agree that we should include these LLMs during clinicopathological conferences. These may give med students and trainees an opportunity to develop critical thinking by identifying its weaknesses and strengths-and propose solutions with or without the collaboration of Computer Science colleagues.…”
Section: A S a Te Achermentioning
confidence: 99%
“…It would be interesting to explore these interactions in a patient perspective type of study. We also need to examine our current pedagogy—MCQs alone may no longer be enough to evaluate student or trainee understanding 25–27 . As an introduction to AI in Medicine, I also agree that we should include these LLMs during clinicopathological conferences.…”
Section: As a Teachermentioning
confidence: 99%