2023
DOI: 10.48550/arxiv.2303.04360
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Abstract: Recent advancements in large language models (LLMs) have led to the development of highly potent models like OpenAI's ChatGPT. These models have exhibited exceptional performance in a variety of tasks, such as question answering, essay composition, and code generation. However, their effectiveness in the healthcare sector remains uncertain. In this study, we seek to investigate the potential of LLMs to aid in clinical text mining by examining their ability to extract structured information from unstructured he… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 26 publications
(30 citation statements)
references
References 32 publications
0
29
0
1
Order By: Relevance
“…Improving LLMs for NER tasks requires at least further fine-tuning but more likely supplying these with domain-specific training data for domains that they are not trained on. In our case, not all biomedical literature is freely shareable, and it is therefore not possible to send these data to external platforms to train such models, a problem that is potentially solvable by generating synthetic data for closed systems . Another issue is available compute to train such models that, even with open LLMs such as LLaMa, require much more resources to train than BERT-based models.…”
Section: Resultsmentioning
confidence: 99%
“…Improving LLMs for NER tasks requires at least further fine-tuning but more likely supplying these with domain-specific training data for domains that they are not trained on. In our case, not all biomedical literature is freely shareable, and it is therefore not possible to send these data to external platforms to train such models, a problem that is potentially solvable by generating synthetic data for closed systems . Another issue is available compute to train such models that, even with open LLMs such as LLaMa, require much more resources to train than BERT-based models.…”
Section: Resultsmentioning
confidence: 99%
“…If there is a relation, then the label should be "Yes", otherwise "No". (Tang et al, 2023) HoC document: < text>; target: The correct category for this document is ? You must choose from the given list of answer categories (introduce what each category is ...)" (Chen et al, 2023) Table 4: The prompts used for different evaluation tasks and datasets.…”
Section: Metricsmentioning
confidence: 99%
“…The research community explored GLLMs for data generation-based data augmentation in various NLP tasks like dialogue generation [410], training smaller LLMs [411], [416], common sense reasoning [412], hate speech detection [413], undesired content detection [414], question answering [415], [425], intent classification [143], relation extraction [155], [422], instruction tuning [417], [418], paraphrase detection [420], tweet intimacy prediction [421], named entity recognition [422], machine translation [424] etc. GLLM-based data generation for data augmentation is explored in multiple domains like general [143], [155], [412], [416]- [418], [420], [424]- [426], social media [409], [413], [414], [421], [423], news [423], scientific literature [155], [420], healthcare [410], [415], [422], dialogue [419], programming [411] etc. Table 19 presents a summary of research works exploring GLLMs for data generationbased data augmentation.…”
Section: Data Generationmentioning
confidence: 99%
“…Based on the evaluation on four topic classification datasets, the authors observed that (i) the proposed approach enhances the model performance and (ii) reduces the querying cost of ChatGPT by a large margin. Some of the research works explored GLLMs for data generation-based data augmentation in various information extraction tasks like relation extraction [155], relation classification [422] and named entity recognition [422]. Xu et al [155] evaluated how effective is the GPT-3.5 model for relation classification.…”
Section: Data Generationmentioning
confidence: 99%
See 1 more Smart Citation