2023
DOI: 10.1097/upj.0000000000000406
|View full text |Cite|
|
Sign up to set email alerts
|

RETRACTED: New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

Abstract: Introduction:Large language models have demonstrated impressive capabilities, but application to medicine remains unclear. We seek to evaluate the use of ChatGPT on the American Urological Association Self-assessment Study Program as an educational adjunct for urology trainees and practicing physicians.Methods:One hundred fifty questions from the 2022 Self-assessment Study Program exam were screened, and those containing visual assets (n=15) were removed. The remaining items were encoded as open ended or multi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

3
20
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 44 publications
(24 citation statements)
references
References 17 publications
3
20
1
Order By: Relevance
“…Overall, the results of this study align well with our previous exploration 3 of ChatGPT: not only did Google Bard perform poorly on the 2022 SASP for urology, but Bard’s responses also revealed concerning flaws. Although Bard is claimed to gather research and cite sources with real-time web searches, it still “hallucinates” facts (like ChatGPT).…”
supporting
confidence: 88%
See 1 more Smart Citation
“…Overall, the results of this study align well with our previous exploration 3 of ChatGPT: not only did Google Bard perform poorly on the 2022 SASP for urology, but Bard’s responses also revealed concerning flaws. Although Bard is claimed to gather research and cite sources with real-time web searches, it still “hallucinates” facts (like ChatGPT).…”
supporting
confidence: 88%
“…Unlike ChatGPT, Bard boasts the ability to cite sources, interpret images, and perform real-time web searches for up-to-date information. Given our previous exploration 3 of AI performance on the 2022 AUA Self-Assessment Study Program (SASP), 4 we sought to evaluate Bard’s performance and assess its utility as an educational tool for urology trainees and physicians.…”
mentioning
confidence: 99%
“…6 Other studies have used more complex questions and clinical scenarios and reported accuracy rates ranging from 26.7% to 81.5% for the chatbot, depending on the specific methods employed or the version that was tested. 7,15,16 In common, the studies that compared the two versions of the chatbot have consistently shown superior performance for version 4. 7,15 In our study, open questions, with different levels of complexity, were posed to ChatGPT.…”
Section: Discussionmentioning
confidence: 99%
“…The current ChatGPT version is not specialized in medical information and was trained using the entire Internet without controls for validity. 12,31 Moreover, it was only trained using data through 2021, although some studies suggest the model is self-learning from newer 2022 to 2023 data. 5 This creates risks of inaccuracy and limited ability to incorporate emerging research.…”
Section: Discussionmentioning
confidence: 99%
“…Misinformation on ChatGPT is apparently minimal but real. 5,6,12-14 Particularly worrisome is AI hallucination, where misinformation may arise from subjective, emotion-laden questioning. 3,4…”
mentioning
confidence: 99%