2024
DOI: 10.1101/2024.04.15.24305869
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Systematic Review of Testing and Evaluation of Healthcare Applications of Large Language Models (LLMs)

Suhana Bedi,
Yutong Liu,
Lucy Orr-Ewing
et al.

Abstract: Importance: Large Language Models (LLMs) can assist in a wide range of healthcare-related activities. Current approaches to evaluating LLMs make it difficult to identify the most impactful LLM application areas. Objective: To summarize the current evaluation of LLMs in healthcare in terms of 5 components: evaluation data type, healthcare task, Natural Language Processing (NLP)/Natural Language Understanding (NLU) task, dimension of evaluation, and medical specialty. Data Sources: A systematic search of PubMed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 59 publications
0
3
0
Order By: Relevance
“…10 A systematic review by Bedi et al highlights the various healthcare applications of LLMs. 11 Studies have demonstrated their utility in tasks such as diagnosis 12 , medical report generation 13 , treatment recommendations 14 , and clinical referrals. 14,15 16 Additionally, Fraser et al studied the diagnostic accuracy of LLMs, providing further evidence of their potential in clinical settings.…”
Section: Llms In Healthcarementioning
confidence: 99%
“…10 A systematic review by Bedi et al highlights the various healthcare applications of LLMs. 11 Studies have demonstrated their utility in tasks such as diagnosis 12 , medical report generation 13 , treatment recommendations 14 , and clinical referrals. 14,15 16 Additionally, Fraser et al studied the diagnostic accuracy of LLMs, providing further evidence of their potential in clinical settings.…”
Section: Llms In Healthcarementioning
confidence: 99%
“…Studies have noted that while LLMs perform well in tasks like answering medical exam questions, their application in direct patient care and other complex medical scenarios remains underexplored and often lacks integration with real patient data. [3]…”
Section: Introductionmentioning
confidence: 99%
“…Studies have noted that while LLMs perform well in tasks like answering medical exam questions, their application in direct patient care and other complex medical scenarios remains underexplored and often lacks integration with real patient data. [3] Moreover, the evaluation of these models in healthcare has often focused narrowly on specific tasks, such as NLP tasks related to summarization and conversation, without a broad application across various medical specialties. This has limited the understanding of their broader potential and areas where they may not perform as expected.…”
Section: Introductionmentioning
confidence: 99%