Electronic medical record phenotyping using the anchor and learn framework

Halpern, Yoni; Horng, Steven; Choi, Youngduck; Sontag, David

doi:10.1093/jamia/ocw011

Cited by 150 publications

(135 citation statements)

References 21 publications

Supporting

Mentioning

135

Contrasting

Order By: Relevance

“…Methodologically speaking, however, learning phenotypes from noisy labels has two ongoing research directions that are interrelated, i.e. modeling phenotypes from anchor variables [44,45] and silver-standard training data [46]. We also note that from the perspective of learning shared representations of diseases (such as the abstraction feature representation in this study), contemporary phenotyping effort has led to a growing body of work that learns phenotypes from population-scale clinical data using the methodology of representation learning [47,48] including i) spectral learning such as non-negative tensor factorization [5], ii) probabilistic mixture models [8], and additionally, when temporal phenotypic patterns are considered, iii) unsupervised feature learning using autoencoders [32] and latent medical concepts [49], etc., and iv) deep learning [6].…”

Section: Discussionmentioning

confidence: 99%

EHR-based phenotyping: Bulk learning and evaluation

Chiu

Hripcsak

2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

In data-driven phenotyping, a core computational task is to identify medical concepts and their variations from sources of electronic health records (EHR) to stratify phenotypic cohorts. A conventional analytic framework for phenotyping largely uses a manual knowledge engineering approach or a supervised learning approach where clinical cases are represented by variables encompassing diagnoses, medicinal treatments and laboratory tests, among others. In such a framework, tasks associated with feature engineering and data annotation remain a tedious and expensive exercise, resulting in poor scalability. In addition, certain clinical conditions, such as those that are rare and acute in nature, may never accumulate sufficient data over time, which poses a challenge to establishing accurate and informative statistical models. In this paper, we use infectious diseases as the domain of study to demonstrate a hierarchical learning method based on ensemble learning that attempts to address these issues through feature abstraction. We use a sparse annotation set to train and evaluate many phenotypes at once, which we call bulk learning. In this batch-phenotyping framework, disease cohort definitions can be learned from within the abstract feature space established by using multiple diseases as a substrate and diagnostic codes as surrogates. In particular, using surrogate labels for model training renders possible its subsequent evaluation using only a sparse annotated sample. Moreover, statistical models can be trained and evaluated, using the same sparse annotation, from within the abstract feature space of low dimensionality that encapsulates the shared clinical traits of these target diseases, collectively referred to as the bulk learning set.

show abstract

Section: Discussionmentioning

confidence: 99%

EHR-based phenotyping: Bulk learning and evaluation

Chiu

Hripcsak

2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

show abstract

“…However, in practice, the conditional independence property does not have to be completely satisfied [12]. On the other hand, if property 1 is relaxed, the false positive rate will automatically increase.…”

Section: Predictive Anchors Via Exploratory Analysismentioning

confidence: 99%

“…To overcome this drawback Halpern et al proposed a very promising framework, with a large number of possible applications. In this framework, which we refer to as the anchor method (AM), one can learn phenotypes and predict clinical state variables from EHR unlabeled data only by specifying a few key observations called anchors [11,12]. An underlying assumption is that the presence of an anchor variable implies the presence of the latent label of interest.…”

Section: Introductionmentioning

confidence: 99%

Using anchors from free text in electronic health records to diagnose postoperative delirium

Mikalsen

Soguero-Ruíz

Jensen

et al. 2017

Computer Methods and Programs in Biomedicine

View full text Add to dashboard Cite

Objectives. Postoperative delirium is a common complication after major surgery among the elderly. Despite its potentially serious consequences, the complication often goes undetected and undiagnosed. In order to provide diagnosis support one could potentially exploit the information hidden in free text documents from electronic health records using data-driven clinical decision support tools. However, these tools depend on labeled training data and can be both time consuming and expensive to create. Methods. The recent "Learning with Anchors" framework resolves this problem by transforming key observations (anchors) into labels. This is a promising framework, but it is heavily reliant on clinician's knowledge for specifying good anchor choices in order to perform well. In this paper we propose a novel method for specifying anchors from free text documents, following an exploratory data analysis approach based on clustering and data visualization techniques. We investigate the use of the new framework as a way to detect postoperative delirium. Results. By applying the proposed method to medical data gathered from a Norwegian University Hospital, we increase the area under the precision-recall curve from 0.51 to 0.96 compared to baselines. Conclusions. The proposed approach can be used as a framework for clinical decision support for postoperative delirium.

show abstract

“…A content summary of these selected papers can be found in the appendix of this synopsis. [2], the authors demonstrated the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record [3]. They validated the phenotype models in the context of Type 2 diabetes mellitus (T2DM) and Myocardial Infarcts (MI) using respectively the phenotype definitions of the eMERGE [1] and OMOP [4] initiatives.…”

Section: About the Paper Selectionmentioning

confidence: 99%

“…Using the Halpern et al method based on "anchor" terms [2], they defined a list of keywords specific to the phenotypes of interest to semi-automatically generate noisy labeled training data. Then, a sample of 1,500 patient records -750 patient records for each phenotype having a "noisy" label for the phenotype and 750 controls taken in the extract disjoint with possible cases (silver standard) -was used to train the XPRESS (eXtraction of Phenotypes from Records using Silver Standards) model.…”

Section: Summary Of Best Papersmentioning

confidence: 99%

Clinical Research Informatics: Contributions from 2016

Daniel¹,

Choquet²

2017

Yearb Med Inform

View full text Add to dashboard Cite

SummaryObjectives: To summarize key contributions to current research in the field of Clinical Research Informatics (CRI) and to select the best papers published in 2016. Methods: A bibliographic search using a combination of MeSH and free terms on CRI was performed using PubMed, followed by a double-blind review in order to select a list of candidate best papers to be then peer-reviewed by external reviewers. A consensus meeting between the two section editors and the editorial team was organized to finally conclude on the selection of best papers. Results: Among the 452 papers published in 2016 in the various areas of CRI and returned by the query, the full review process selected four best papers. The authors of the first paper utilized a comprehensive representation of the patient medical record and semi-automatically labeled training sets to create phenotype models via a machine learning process. The second selected paper describes an open source tool chain securely connecting ResearchKit compatible applications (Apps) to the widely-used clinical research infrastructure Informatics for Integrating Biology and the Bedside (i2b2). The third selected paper describes the FAIR Guiding Principles for scientific data management and stewardship. The fourth selected paper focuses on the evaluation of the risk of privacy breaches in releasing genomics datasets.

show abstract

Electronic medical record phenotyping using the anchor and learn framework

Cited by 150 publications

References 21 publications

EHR-based phenotyping: Bulk learning and evaluation

EHR-based phenotyping: Bulk learning and evaluation

Using anchors from free text in electronic health records to diagnose postoperative delirium

Clinical Research Informatics: Contributions from 2016

Contact Info

Product

Resources

About