2024
DOI: 10.1016/j.clinimag.2024.110101
|View full text |Cite
|
Sign up to set email alerts
|

When vision meets reality: Exploring the clinical applicability of GPT-4 with vision

Jiawen Deng,
Kiyan Heybati,
Matthew Shammas-Toma
Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 15 publications
3
3
0
Order By: Relevance
“…This is in line with several prior studies reporting low diagnostic performance of GPT-4(V) with radiological images alone as input (12,18,20,21). Consistent with the findings of Schubert et al, we observed that the combination of medical history and radiological images yielded higher diagnostic accuracy than images alone, although the difference did not reach statistical significance (17).…”
Section: Discussionsupporting
confidence: 92%
See 3 more Smart Citations
“…This is in line with several prior studies reporting low diagnostic performance of GPT-4(V) with radiological images alone as input (12,18,20,21). Consistent with the findings of Schubert et al, we observed that the combination of medical history and radiological images yielded higher diagnostic accuracy than images alone, although the difference did not reach statistical significance (17).…”
Section: Discussionsupporting
confidence: 92%
“…For the binary scoring system, a chi-square test was performed over all groups with subsequent pairwise testing. For the numeric scoring system, a Kruskal-Wallis test was conducted across all groups followed by pairwise testing using Dunn’s test (21). For both scoring systems, results were adjusted to a false-discovery rate of 0.05 by employing the Benjamini-Hochberg procedure (22).…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Similar to hallucination rates described previously (25), we found statements inconsistent with widely accepted medical knowledge in 5.1% of LLM responses. Many of these involved incorrect interpretations of MRI screenshots provided as input, confirming earlier studies demonstrating low performance of current state-of-the art LLMs in diagnostic tasks based on radiological images (27–30). Interestingly, hallucinations were even found with PerplexityAI, which – unlike other Chatbots such as ChatGPT - combines LLMs with real-time information retrieval from the internet to support its answers with relevant sources (31).…”
Section: Discussionsupporting
confidence: 77%