RETRACTED: New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

Huynh, Linda; Bonebrake, Benjamin T.; Schultis, K; Quach, Alan; Deibert, Christopher M.

doi:10.1097/upj.0000000000000406

Cited by 44 publications

(24 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Overall, the results of this study align well with our previous exploration 3 of ChatGPT: not only did Google Bard perform poorly on the 2022 SASP for urology, but Bard’s responses also revealed concerning flaws. Although Bard is claimed to gather research and cite sources with real-time web searches, it still “hallucinates” facts (like ChatGPT).…”

supporting

confidence: 88%

“…Unlike ChatGPT, Bard boasts the ability to cite sources, interpret images, and perform real-time web searches for up-to-date information. Given our previous exploration 3 of AI performance on the 2022 AUA Self-Assessment Study Program (SASP), 4 we sought to evaluate Bard’s performance and assess its utility as an educational tool for urology trainees and physicians.…”

mentioning

confidence: 99%

See 1 more Smart Citation

RETRACTED: Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology

Huynh,

Bonebrake,

Schultis

et al. 2023

Urology Practice

View full text Add to dashboard Cite

Data analysis and interpretation, drafting the manuscript, Supervision, collection of data from patients we have tested in our office, data collection. CMD: Conception and design, critical revision of the manuscript for scientific and factual content, supervision, collection of data from patients we have tested in our office, data collection. KS: Data analysis and interpretation, conception and design, critical revision of the manuscript for scientific and factual content; drafting the manuscript, statistical analysis, collection of data from patients we have tested in our office, data collection. LMH: Data analysis and interpretation; conception and design, critical revision of the manuscript for scientific and factual content; drafting the manuscript, supervision; statistical analysis, collection of data from patients we have tested in our office, data collection.

show abstract

supporting

confidence: 88%

mentioning

confidence: 99%

RETRACTED: Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology

Huynh,

Bonebrake,

Schultis

et al. 2023

Urology Practice

View full text Add to dashboard Cite

show abstract

“…6 Other studies have used more complex questions and clinical scenarios and reported accuracy rates ranging from 26.7% to 81.5% for the chatbot, depending on the specific methods employed or the version that was tested. 7,15,16 In common, the studies that compared the two versions of the chatbot have consistently shown superior performance for version 4. 7,15 In our study, open questions, with different levels of complexity, were posed to ChatGPT.…”

Section: Discussionmentioning

confidence: 99%

Conformity of ChatGPT recommendations with the AUA/SUFU guideline on postprostatectomy urinary incontinence

Pinto,

de Azevedo,

Wroclawski

et al. 2024

Neurourology and Urodynamics

View full text Add to dashboard Cite

IntroductionArtificial intelligence (AI) shows immense potential in medicine and Chat generative pretrained transformer (ChatGPT) has been used for different purposes in the field. However, it may not match the complexity and nuance of certain medical scenarios. This study evaluates the accuracy of ChatGPT 3.5 and 4 in providing recommendations regarding the management of postprostatectomy urinary incontinence (PPUI), considering The Incontinence After Prostate Treatment: AUA/SUFU Guideline as the best practice benchmark.Materials and MethodsA set of questions based on the AUA/SUFU Guideline was prepared. Queries included 10 conceptual questions and 10 case‐based questions. All questions were open and entered into the ChatGPT with a recommendation to limit the answer to 200 words, for greater objectivity. Responses were graded as correct (1 point); partially correct (0.5 point), or incorrect (0 point). Performances of versions 3.5 and 4 of ChatGPT were analyzed overall and separately for the conceptual and the case‐based questions.ResultsChatGPT 3.5 scored 11.5 out of 20 points (57.5% accuracy), while ChatGPT 4 scored 18 (90.0%; p = 0.031). In the conceptual questions, ChatGPT 3.5 provided accurate answers to six questions along with one partially correct response and three incorrect answers, with a final score of 6.5. In contrast, ChatGPT 4 provided correct answers to eight questions and partially correct answers to two questions, scoring 9.0. In the case‐based questions, ChatGPT 3.5 scored 5.0, while ChatGPT 4 scored 9.0. The domains where ChatGPT performed worst were evaluation, treatment options, surgical complications, and special situations.ConclusionChatGPT 4 demonstrated superior performance compared to ChatGPT 3.5 in providing recommendations for the management of PPUI, using the AUA/SUFU Guideline as a benchmark. Continuous monitoring is essential for evaluating the development and precision of AI‐generated medical information.

show abstract

“…The current ChatGPT version is not specialized in medical information and was trained using the entire Internet without controls for validity. 12,31 Moreover, it was only trained using data through 2021, although some studies suggest the model is self-learning from newer 2022 to 2023 data. 5 This creates risks of inaccuracy and limited ability to incorporate emerging research.…”

Section: Discussionmentioning

confidence: 99%

“…Misinformation on ChatGPT is apparently minimal but real. 5,6,12-14 Particularly worrisome is AI hallucination, where misinformation may arise from subjective, emotion-laden questioning. 3,4…”

mentioning

confidence: 99%

Comparison of ChatGPT and Traditional Patient Education Materials for Men’s Health

Shah,

Ghosh,

Hochberg

et al. 2024

Urology Practice

View full text Add to dashboard Cite

Introduction:ChatGPT is an artificial intelligence platform available to patients seeking medical advice. Traditionally, urology patients consulted official provider-created materials, particularly the Urology Care FoundationÔ (UCF). Today, men increasingly go online due to the rising costs of health care and the stigma surrounding sexual health. Online health information is largely inaccessible to laypersons as it exceeds the recommended American sixth to eighth grade reading level. We conducted a comparative assessment of patient education materials generated by ChatGPT vs UCF regarding men's health conditions.Methods: All 6 UCF men's health resources were identified. ChatGPT responses were generated using patient questions obtained from UCF. Adjusted ChatGPT responses were generated by prompting, "Explain it to me like I am in sixth grade." Textual analysis was performed using sentence, word, syllable, and complex word count. Six validated formulae were used for readability analysis. Two physicians independently scored responses for accuracy, comprehensiveness, and understandability. Statistical analysis involved Wilcoxon matched-pairs test.Results: ChatGPT responses were longer and more complex. Both UCF and ChatGPT failed official readability standards, although ChatGPT performed significantly worse across all 6 topics (all P < .001). Conversely, adjusted ChatGPT readability typically surpassed UCF, even meeting the recommended level for 2 topics. Qualitatively, UCF and ChatGPT had comparable accuracy, although ChatGPT had better comprehensiveness and worse understandability.Conclusions: When comparing readability, ChatGPT-generated education is less accessible than provider-written content, although neither meets the recommended level. Our analysis indicates that specific artificial intelligence prompts can simplify educational materials to meet national standards and accommodate individual literacy.

show abstract

RETRACTED: New Artificial Intelligence ChatGPT Performs Poorly on the 2022 Self-assessment Study Program for Urology

Cited by 44 publications

References 17 publications

RETRACTED: Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology

RETRACTED: Google Bard Artificial Intelligence vs the 2022 Self-Assessment Study Program for Urology

Conformity of ChatGPT recommendations with the AUA/SUFU guideline on postprostatectomy urinary incontinence

Comparison of ChatGPT and Traditional Patient Education Materials for Men’s Health

Contact Info

Product

Resources

About