2020
DOI: 10.20944/preprints202010.0649.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Modern Clinical Text Mining: A Guide and Review

Abstract: Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an over… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 45 publications
0
10
0
Order By: Relevance
“…For details of studies from before 2010 and/or focusing on clinical speech, we recommend other excellent reviews. 1,[3][4][5]9 The PRISMA flow chart in Figure 2 details the search results and inclusion/exclusion selection criteria. From the initial 1812 records retrieved from the databases, 561 duplicates were automatically removed, and a further 880 were removed after manual screening of titles and abstracts based on specific inclusion/exclusion criteria, leaving 371 in-scope articles.…”
Section: Methodsmentioning
confidence: 99%
“…For details of studies from before 2010 and/or focusing on clinical speech, we recommend other excellent reviews. 1,[3][4][5]9 The PRISMA flow chart in Figure 2 details the search results and inclusion/exclusion selection criteria. From the initial 1812 records retrieved from the databases, 561 duplicates were automatically removed, and a further 880 were removed after manual screening of titles and abstracts based on specific inclusion/exclusion criteria, leaving 371 in-scope articles.…”
Section: Methodsmentioning
confidence: 99%
“…For example, large general purpose language models from the kind of BERT [1]-or GPT [2,3]-inspired architectures are commonly trained on large corpora such as Common Crawl [4] or The Pile [5] that are composed of 320 TiB (Common Crawl) or 825 GiB (The Pile) raw text data. Since any kind of such large-scale data is infeasible to annotate, these datasets are mainly purposed for unsupervised methods such as pretraining [6]. However, when facing case-specific downstream tasks, well-suited datasets are used for fine-tuning in a supervised fashion [6].…”
Section: Introductionmentioning
confidence: 99%
“…Since any kind of such large-scale data is infeasible to annotate, these datasets are mainly purposed for unsupervised methods such as pretraining [6]. However, when facing case-specific downstream tasks, well-suited datasets are used for fine-tuning in a supervised fashion [6]. In this context the dataset is required to be annotated for a certain task accordingly.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, analyzing and processing this data is difficult and necessitates the use of specialized software. DM methods have a tremendous potential for analyzing such huge amounts of saved biomedical data in attempt to uncover expertise, as demonstrated by the positive stories discussed Percha in [2]. KDD is a broad term for the procedure of extracting usable, tacit, and previously undiscovered knowledge from massive data sets.…”
Section: Introductionmentioning
confidence: 99%