2022
DOI: 10.1016/j.simpa.2022.100373
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Clinical and Biomedical Named Entity Recognition at Scale

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 48 publications
0
8
0
Order By: Relevance
“…PSJH has an existing corpora of de-identified notes that were created using a sequence of operations performed on text data to remove PHI (protected health information) 34 . These operations included multiple pre-trained ML models and/or regular expressions.…”
Section: De-identification Of Patient Notesmentioning
confidence: 99%
“…PSJH has an existing corpora of de-identified notes that were created using a sequence of operations performed on text data to remove PHI (protected health information) 34 . These operations included multiple pre-trained ML models and/or regular expressions.…”
Section: De-identification Of Patient Notesmentioning
confidence: 99%
“…[1][2][3][4][5][6] While rule-based models extract phenotypes based on pre-defined patterns, most machine learning and deep-learning approaches are trained on sentences or documents labeled with the relevant phenotypes and the model subsequently classifies texts into these phenotypes. 5,7 MedspaCy 6 and scispaCy 8 are two recent and extensively-used hybrid frameworks that utilize statistical and machine-learning methods in conjunction with rule-based NLP to identify clinical phenotypes.…”
Section: Introductionmentioning
confidence: 99%
“…[1][2][3][4][5][6][7] While rule-based models extract phenotypes based on pre-defined patterns, most machine learning and deep-learning approaches are trained on sentences or documents labeled with the relevant phenotypes and the model subsequently classifies texts into these phenotypes. 5,8 SpaCy models, including MedspaCy 7 and scispaCy 9 are two recent and frequently used hybrid frameworks that utilize statistical and machine-learning named entity recognition methods in conjunction with rule-based NLP to identify clinical phenotypes. There are studies that have utilized medspaCy and scispaCy to identify specific sections within EHR text for NER, extract phenotypes from relation extraction documents, and generate text embeddings.…”
Section: Introductionmentioning
confidence: 99%
“…It was estimated in 2017 that the footprint of medical data would double every 73 days by 2020 and continue to increase exponentially, and an estimated 80% of this data would be unstructured (2). However, this form of data remains largely inaccessible to statistical analysis (3,4). Furthermore, manual extraction of this data is costly and time consuming (5,6).…”
Section: Introductionmentioning
confidence: 99%
“…Spark NLP is a widely used library by healthcare organizations for NLP pipelines that are accurate and scale easily in a distributed environment (3). The Spark NLP Named Entity Recognition (NER) models use a BiLSTM-CNN-Char deep neural network architecture (14).…”
Section: Introductionmentioning
confidence: 99%