2023
DOI: 10.1101/2023.07.13.23292613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the limitations of large language models in clinical diagnosis

Abstract: Background: The potential of large language models (LLM) such as GPT to support complex tasks such as differential diagnosis has been a subject of debate, with some ascribing near sentient abilities to the models and others claiming that LLMs merely perform "autocomplete on steroids". A recent study reported that the Generative Pretrained Transformer 4 (GPT-4) model performed well in complex differential diagnostic reasoning. The authors assessed the performance of GPT-4 in identifying the correct diagnosis in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
20
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(20 citation statements)
references
References 21 publications
0
20
0
Order By: Relevance
“…For example, some investigators have shown that they can offer tailored medical guidance [ 12 ], distribute educational resources [ 7 ], and improve the quality of medical training [ 7 , 13 , 14 ]. These tools can also support clinical decision making [ 15 , 16 , 17 ], help identify urgent medical situations [ 18 ], and respond to patient inquiries with understanding and empathy [ 19 , 20 , 21 ]. Extensive research has shown that ChatGPT, particularly its most recent version GPT-4, excels across various standardized tests.…”
Section: Introductionmentioning
confidence: 99%
“…For example, some investigators have shown that they can offer tailored medical guidance [ 12 ], distribute educational resources [ 7 ], and improve the quality of medical training [ 7 , 13 , 14 ]. These tools can also support clinical decision making [ 15 , 16 , 17 ], help identify urgent medical situations [ 18 ], and respond to patient inquiries with understanding and empathy [ 19 , 20 , 21 ]. Extensive research has shown that ChatGPT, particularly its most recent version GPT-4, excels across various standardized tests.…”
Section: Introductionmentioning
confidence: 99%
“…This is because currently interactions with ChatGPT are transmitted over the internet. 83 Additionally, if limited to basic (non-identifying) diagnostic information, the performance of ChatGPT degrades considerably. 83 These and related issues have influenced the development of LLMs derived specifically from deidentified clinical notes as well as other relevant sources.…”
Section: Introductionmentioning
confidence: 99%
“… 83 Additionally, if limited to basic (non-identifying) diagnostic information, the performance of ChatGPT degrades considerably. 83 These and related issues have influenced the development of LLMs derived specifically from deidentified clinical notes as well as other relevant sources. 82 Despite general advances in LLM capabilities, the robust application to biomedical and clinical text is likely a mid-term beneficiary of AI/ML advancements.…”
Section: Introductionmentioning
confidence: 99%
“…While these models are robust in representing the behavior of the training data, they are lighter and easier to diagnose compared to LLMs. Concerns around the interpretability and potential biases of large black-box models like LLMs could make simpler, more transparent ML models preferable in high-stakes domains like healthcare diagnosis, where clinician oversight and autonomy are important (Reese et al, 2023;Fisch, 2024;Κούρου et al, 2015;Kavakiotis et al, 2017;Salvatore et al, 2014;Malek, 2023;Zack et al, 2023;Abu-Jeyyab, 2023;Huang et al, 2023;Takahashi, 2023). Lehman et al (2023) explore whether LLMs, trained primarily with general web text, are suitable for specialized domains like clinical text.…”
Section: Introductionmentioning
confidence: 99%