2021
DOI: 10.1200/cci.20.00173
|View full text |Cite
|
Sign up to set email alerts
|

Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records

Abstract: PURPOSE Knowing the treatments administered to patients with cancer is important for treatment planning and correlating treatment patterns with outcomes for personalized medicine study. However, existing methods to identify treatments are often lacking. We develop a natural language processing approach with structured electronic medical records and unstructured clinical notes to identify the initial treatment administered to patients with cancer. METHODS We used a total number of 4,412 patients with 483,782 cl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
24
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 29 publications
(24 citation statements)
references
References 57 publications
0
24
0
Order By: Relevance
“…We first pre-processed the unstructured data, including word segmentation and removal of stop words. Then we used TaggedDocument in the gensim package to wrap the input sentence and change it to the input sample format required by Doc2Vec [ 25 , 26 ]. After that, we loaded the Doc2vec model with window size of 3 and started training, and finally we mapped the unstructured data into 128-dimensional paragraph vectors and made further predictions.…”
Section: Methodsmentioning
confidence: 99%
“…We first pre-processed the unstructured data, including word segmentation and removal of stop words. Then we used TaggedDocument in the gensim package to wrap the input sentence and change it to the input sample format required by Doc2Vec [ 25 , 26 ]. After that, we loaded the Doc2vec model with window size of 3 and started training, and finally we mapped the unstructured data into 128-dimensional paragraph vectors and made further predictions.…”
Section: Methodsmentioning
confidence: 99%
“…In the end, we have 9 structured covariates for prostate cancer and 7 structured covariates for NSCLC. [7, 8].…”
Section: Methodsmentioning
confidence: 99%
“…In the end, we have 9 structured covariates for prostate cancer and 7 structured covariates for NSCLC. While billing codes can be used to generate additional structured features for diagnosis and past treatments, existing studies have found these can be unreliable [27, 28]. Hence, we chose to focus mainly on clinical notes to capture additional information that can influence survival time, such as patient symptoms and performance status.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Observational studies are more reliable when we can better control for these confounders. While structured EMR data, such as billing codes, can be used to encode expert-curated patient characteristics, studies suggest that administrative claims data may contain errors 27 , 28 and expert-curated covariates may not capture all potential confounding 7 , 29 . EMR clinical text is a potential source of additional information about factors that might relate to both treatment assignment and prognosis.…”
Section: Introductionmentioning
confidence: 99%