Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing 2022
DOI: 10.18653/v1/2022.emnlp-main.130
|View full text |Cite
|
Sign up to set email alerts
|

Large language models are few-shot clinical information extractors

Abstract: A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT (Ouyang et al., 2022), perform well at zero-and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and gener… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
7
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 111 publications
(25 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…One of the key aspects of prompt engineering is the number of examples or shots that are provided to the model along with the prompt. Few-shot prompting is a technique that provides the model with a few examples of input-output pairs, while zero-shot prompting does not provide any examples [ 3 , 18 ]. By contrasting these strategies, we aim to shed light on the most efficient and effective ways to leverage prompt engineering in clinical NLP.…”
Section: Introductionmentioning
confidence: 99%
“…One of the key aspects of prompt engineering is the number of examples or shots that are provided to the model along with the prompt. Few-shot prompting is a technique that provides the model with a few examples of input-output pairs, while zero-shot prompting does not provide any examples [ 3 , 18 ]. By contrasting these strategies, we aim to shed light on the most efficient and effective ways to leverage prompt engineering in clinical NLP.…”
Section: Introductionmentioning
confidence: 99%
“…However, these large-scale data are often unstructured, requiring extensive processing and labeling, which poses the most signi cant bottleneck (12). In a precise eld such as medicine, errors in the labeling and preprocessing process can lead to poor outcomes in terms of the reliability of AI models and the impact of model results (11,(13)(14)(15)(16)(17). Therefore, domain experts are often employed for labeling tasks in the present day, a process that is both time-consuming and costly (6, 18).…”
Section: Introductionmentioning
confidence: 99%
“…Unstructured EHRs are characterized by a wide array of data formats, including free-text clinical notes, laboratory ndings, and imaging narratives. Each of these formats exhibits unique terminological and syntactical features, ambiguous jargon, and nonstandard phrasal structures (17,(19)(20)(21)(22)(23). To mitigate such complexity, the encoding of patients' diseases in EHRs using universally accepted disease classi cation coding systems such as the International Classi cation of Disease (ICD) facilitates the clustering of patients, providing convenience.…”
Section: Introductionmentioning
confidence: 99%
“…nstruction-tuned large language models (LLMs) have been successful at knowledge retrieval, 1 -4 text extraction, [5][6][7][8][9] summarization, [10][11][12] and reasoning [13][14][15][16][17] tasks without requiring domain-specific fine-tuning. Prompting LLMs with instruction and data contexts described in natural language has emerged as a means for task and domain specification as well as controllability of model behaviors.…”
mentioning
confidence: 99%
“…Because there is no single postoperative outcome measure of risk, LLM capabilities were surveyed on 8 different tasks: (1) assignment of the American Society of Anesthesiologists Physical Status (ASA-PS) classification, [25][26][27] (2) prediction of postanesthesia care unit (PACU) phase 1 duration, (3) hospital admission, (4) hospital duration, (5) intensive care unit (ICU) admission, (6) ICU duration, (7) whether the patient will have an unanticipated hospital admission, and (8) whether the patient will die in the hospital. The LLM-generated responses were compared against ground-truth labels extracted from patients' EHR, and performance metrics were reported based on this comparison (Figure 1).…”
mentioning
confidence: 99%