Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance

Wei, Wei‐Qi; Teixeira, Pedro L.; Mo, Huan; Cronin, Robert M.; Warner, Jeremy L.; Denny, Joshua C.

doi:10.1093/jamia/ocv130

Cited by 169 publications

(175 citation statements)

References 42 publications

Supporting

Mentioning

169

Contrasting

Order By: Relevance

“…Furthermore, structured data elements, such as International Classification of Diseases billing codes, can still be subject to high error rates and are often not sufficient for phenotyping activities. 22 …”

Section: Discussionmentioning

confidence: 99%

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records

Gregg

Lang

Wang

et al. 2017

JCO Clinical Cancer Informatics

View full text Add to dashboard Cite

Purpose Risk stratification underlies system-wide efforts to promote the delivery of appropriate prostate cancer care. Although the elements of risk stratum are available in the electronic medical record, manual data collection is resource intensive. Therefore, we investigated the feasibility and accuracy of an automated data extraction method using natural language processing (NLP) to determine prostate cancer risk stratum. Methods Manually collected clinical stage, biopsy Gleason score, and preoperative prostate-specific antigen (PSA) values from our prospective prostatectomy database were used to categorize patients as low, intermediate, or high risk by D’Amico risk classification. NLP algorithms were developed to automate the extraction of the same data points from the electronic medical record, and risk strata were recalculated. The ability of NLP to identify elements sufficient to calculate risk (recall) was calculated, and the accuracy of NLP was compared with that of manually collected data using the weighted Cohen’s κ statistic. Results Of the 2,352 patients with available data who underwent prostatectomy from 2010 to 2014, NLP identified sufficient elements to calculate risk for 1,833 (recall, 78%). NLP had a 91% raw agreement with manual risk stratification (κ = 0.92; 95% CI, 0.90 to 0.93). The κ statistics for PSA, Gleason score, and clinical stage extraction by NLP were 0.86, 0.91, and 0.89, respectively; 91.9% of extracted PSA values were within ± 1.0 ng/mL of the manually collected PSA levels. Conclusion NLP can achieve more than 90% accuracy on D’Amico risk stratification of localized prostate cancer, with adequate recall. This figure is comparable to other NLP tasks and illustrates the known trade off between recall and accuracy. Automating the collection of risk characteristics could be used to power real-time decision support tools and scale up quality measurement in cancer care.

show abstract

Section: Discussionmentioning

confidence: 99%

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records

Gregg

Lang

Wang

et al. 2017

JCO Clinical Cancer Informatics

View full text Add to dashboard Cite

show abstract

“…19,20 Each data source poses unique challenges, and use of multiple data sources often improves performance. 21 Billing code-based phenotyping methods have variable performance with estimates for cardiovascular and stroke risk factors ranging from 0.55 to 0.95 positive predictive value (PPV). 22 Similarly, various phenotyping studies have used natural language processing (NLP)-extracted concepts alone, with sensitivities ranging from 72% to 99.6% and PPV between 63% and 100%.…”

Section: Background and Significancementioning

confidence: 99%

Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals

Teixeira

Wei

Cronin

et al. 2016

Journal of the American Medical Informatics Association

Self Cite

View full text Add to dashboard Cite

Objective: Phenotyping algorithms applied to electronic health record (EHR) data enable investigators to identify large cohorts for clinical and genomic research. Algorithm development is often iterative, depends on fallible investigator intuition, and is time-and labor-intensive. We developed and evaluated 4 types of phenotyping algorithms and categories of EHR information to identify hypertensive individuals and controls and provide a portable module for implementation at other sites. Materials and Methods: We reviewed the EHRs of 631 individuals followed at Vanderbilt for hypertension status. We developed features and phenotyping algorithms of increasing complexity. Input categories included International Classification of Diseases, Ninth Revision (ICD9) codes, medications, vital signs, narrative-text search results, and Unified Medical Language System (UMLS) concepts extracted using natural language processing (NLP). We developed a module and tested portability by replicating 10 of the best-performing algorithms at the Marshfield Clinic. Results: Random forests using billing codes, medications, vitals, and concepts had the best performance with a median area under the receiver operator characteristic curve (AUC) of 0.976. Normalized sums of all 4 categories also performed well (0.959 AUC). The best non-NLP algorithm combined normalized ICD9 codes, medications, and blood pressure readings with a median AUC of 0.948. Blood pressure cutoffs or ICD9 code counts alone had AUCs of 0.854 and 0.908, respectively. Marshfield Clinic results were similar. Conclusion: This work shows that billing codes or blood pressure readings alone yield good hypertension classification performance. However, even simple combinations of input categories improve performance. The most complex algorithms classified hypertension with excellent recall and precision.

show abstract

“…The time period is three months (90 days) before the occurrence of the target ADE event, i.e., up to but not including the time point when the target ADE has been assigned. 4 Basic descriptions of each dataset, including class label, the number of positive and negative examples, and the number of involved clinical measurements are presented in Table 1.…”

Section: Data Sourcementioning

confidence: 99%

“…To tackle the integration problem, studies have been conducted on the knowledge level and the data level, respectively. Some rely on domain knowledge to extract a joint patient cohort by dening criteria from dierent data types [4], while others explore the possibility of integrating heterogeneous EHR data prior to or post modeling [5,6,7,8]. The focus of this study is on the latter: analyzing complex longitudinal data.…”

Section: Introductionmentioning

confidence: 99%

Learning from heterogeneous temporal data in electronic health records

Zhao

Papapetrou

Asker

et al. 2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Electronic health records contain large amounts of longitudinal data that are valuable for biomedical informatics research. The application of machine learning is a promising alternative to manual analysis of such data. However, the complex structure of the data, which includes clinical events that are unevenly distributed over time, poses a challenge for standard learning algorithms. Some approaches to modeling temporal data rely on extracting single values from time series; however, this leads to the loss of potentially valuable sequential information. How to better account for the temporality of clinical data, hence, remains an important research question. In this study, novel representations of temporal data in electronic health records are explored. These representations retain the sequential information, and are directly compatible with standard machine learning algorithms. The explored methods are based on symbolic sequence representations of time series data, which are utilized in a number of different ways. An empirical investigation, using 19 datasets comprising clinical measurements observed over time from a real database of electronic health records, shows that using a distance measure to random subsequences leads to substantial improvements in predictive performance compared to using the original sequences or clustering the sequences. Evidence is moreover provided on the quality of the symbolic sequence representation by comparing it to sequences that are generated using domain knowledge by clinical experts. The proposed method creates representations that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

show abstract

Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance

Abstract: Multiple EHR components provide a more consistent and higher performance than a single one for the selected phenotypes. We suggest considering multiple EHR components for future phenotyping design in order to obtain an ideal result.

Cited by 169 publications

References 42 publications

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records

Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals

Learning from heterogeneous temporal data in electronic health records

Contact Info

Product

Resources

About