2018
DOI: 10.1007/s10278-018-0105-8
|View full text |Cite
|
Sign up to set email alerts
|

Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning

Abstract: Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert review… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 28 publications
(29 reference statements)
0
17
0
Order By: Relevance
“…Our hybrid NLP pipeline leveraged the complementary strengths of both the rule‐based and the CNN models in classifying the severity of UI and produced higher performance than CNN only approach. Therefore, we believe our semiautomated iterative approach might produce optimal and replicable results in other settings 31 . Moreover, since the terms for UI severity classification in the model are not unique to prostate cancer patients, the model likely could be used to identify UI and its severity in men and women.…”
Section: Discussionmentioning
confidence: 93%
See 2 more Smart Citations
“…Our hybrid NLP pipeline leveraged the complementary strengths of both the rule‐based and the CNN models in classifying the severity of UI and produced higher performance than CNN only approach. Therefore, we believe our semiautomated iterative approach might produce optimal and replicable results in other settings 31 . Moreover, since the terms for UI severity classification in the model are not unique to prostate cancer patients, the model likely could be used to identify UI and its severity in men and women.…”
Section: Discussionmentioning
confidence: 93%
“…The lower performance might be attributed to the paucity of training set which includes less than 100 examples for each class. Limitations of machine learning methods on small sample set has emerged as a significant challenge in recent studies, necessarily due to limited population sizes available is single EHRs where studies are interrogating single diseases, symptoms, or outcomes 30,31 . Hence, researchers have gravitated toward create hybrid frameworks using traditional NLP and machine learning solutions 31 …”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…As NLP and deep learning algorithms increase in sophistication with a growing body of clinical data available from EHR and online websites, we imagine that other cancer drug repurposing instances will arise from similar endeavors. There has also been an increased involvement of large private entities such as Google and Amazon that have developed their own deep learning platforms (Google AutoML and Amazon Comprehend Medical) and application programming interface (APIs) (91). With involvement of these entities and their supercomputing capabilities, identification of new drug repurposing opportunities through NLP/deep learning will undoubtedly be accelerated.…”
Section: Ehr-based Machine Learningmentioning
confidence: 99%
“…3 Potential use cases for word embeddings include similarity searches (may be used to look for cases worded similarly to a particular report), case classification/categorization, and automated labeling of free-text pathology reports. 4 There are a multitude of papers applying NLP techniques to free-text data (e.g., patient encounter notes in electronic medical records 5 and radiology reports 6 ) but there are currently few NLP publications related to the field of anatomic pathology. Currently, many studies center on using machine learning methods such as convolutional neural networks for image classification, 7 segmentation, 8 or stain normalization.…”
Section: Introductionmentioning
confidence: 99%