2019
DOI: 10.1200/cci.19.00057
|View full text |Cite
|
Sign up to set email alerts
|

Clinical Data Extraction and Normalization of Cyrillic Electronic Health Records Via Deep-Learning Natural Language Processing

Abstract: PURPOSEA substantial portion of medical data is unstructured. Extracting data from unstructured text presents a barrier to advancing clinical research and improving patient care. In addition, ongoing studies have been focused predominately on the English language, whereas inflected languages with non-Latin alphabets (such as Slavic languages with a Cyrillic alphabet) present numerous linguistic challenges. We developed deep-learning–based natural language processing algorithms for automatically extracting biom… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 15 publications
1
15
0
Order By: Relevance
“…This study was designed to evaluate the possibility of automatically extracting the status of the 3 main breast cancer biomarkers (ER, PR, and HER2) from the contents of pathology reports written in two different languages, and coming from 82 different providers, using conventional machine learning models. After testing different classifiers, the best performing ones achieved macro-averaged F 1 scores ranging from 0.89 to 0.92 on the held-out test sets, which is on par with best efforts in the literature (6,7,11,12). The reported F 1 scores in the literature range between 0.87 and 1, but use only three possible labels for HER2, whereas five are used in the present work.…”
Section: Discussionsupporting
confidence: 58%
See 2 more Smart Citations
“…This study was designed to evaluate the possibility of automatically extracting the status of the 3 main breast cancer biomarkers (ER, PR, and HER2) from the contents of pathology reports written in two different languages, and coming from 82 different providers, using conventional machine learning models. After testing different classifiers, the best performing ones achieved macro-averaged F 1 scores ranging from 0.89 to 0.92 on the held-out test sets, which is on par with best efforts in the literature (6,7,11,12). The reported F 1 scores in the literature range between 0.87 and 1, but use only three possible labels for HER2, whereas five are used in the present work.…”
Section: Discussionsupporting
confidence: 58%
“…The extraction is performed at a national level, involving 82 different data providers (all Belgian laboratories for pathological anatomy). This problem has not been addressed in the previously mentioned studies, which focused on a limited number of data sources (6,(11)(12)(13).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In terms of data interpretation and translation, we recommend the use of tools to extract and represent the medical substrate by synthesizing only relevant aspects in a declarative way. ML techniques (ie, deep learning—recurrent networks with word embeddings and distributed representations) can handle very large and sparse data (eg, device data may only be available for a small subset of individuals) to capture the sequential character of the data and are suitable for modeling context dependencies in inputs [ 57 ]. Such systems, which incorporate word embeddings encoding syntactic and polarity information in the language followed by deep neural network architectures, are already used to extract and normalize parameters within oncology care data.…”
Section: Resultsmentioning
confidence: 99%
“…In the era of innovations in digital technologies and value-based patient care substantial part of the medical information is still unstructured. Extraction and systematization of information from medical records are of a great significance for improvement of diagnosis, treatment, survival prediction, resource allocation and decision making [10]. Hence, in recent years there is an increased interest towards the term 'big data' , its interpretation and potential use in health economics and outcomes research (HEOR) [11].…”
Section: Introductionmentioning
confidence: 99%