Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder

Lingren, Todd; Chen, Pei; Bochenek, Joseph; Doshi‐Velez, Finale; Manning-Courtney, Patty; Bickel, Julie; Welchons, Leah Wildenger; Reinhold, Judy; Bing, Nicole M.; Ni, Yizhao; Barbaresi, William J.; Mentch, Frank; Basford, Melissa A.; Denny, Joshua C.; Vazquez, Lyam; Perry, Cassandra; Namjou, Bahram; Qiu, Haijun; Connolly, John J.; Abrams, Debra; Holm, Ingrid A.; Cobb, Beth A.; Lingren, Nataline; Solti, Imre; Hákonarson, Hákon; Kohane, Isaac S.; Harley, John B.; Savova, Guergana

doi:10.1371/journal.pone.0159621

Cited by 67 publications

(76 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The subgroups also differed by age of ASD diagnosis (earliest in the psychiatric comorbidity group) and prevalence of intellectual disability (highest in the subtype enriched for seizure disorders). In a subsequent larger analysis, investigators in the eMERGE network [Lingren and others 2016] were largely able to recapitulate these clusters using an NLP-derived algorithm applied across multiple institutions in more than 20,000 ASD patients. The identification of novel data-driven subtypes provides an enticing opportunity for genomic studies of psychopathology where the heterogeneity of clinical syndromes is widely assumed.…”

Section: Application: Phenotypic Clusters and Subtypingmentioning

confidence: 99%

The use of electronic health records for psychiatric phenotyping and genomics

Smoller

2017

American J of Med Genetics Pt B

102

View full text Add to dashboard Cite

The widespread adoption of electronic health record (EHRs) in healthcare systems has created a vast and continuously growing resource of clinical data and provides new opportunities for population-based research. In particular, the linking of EHRs to biospecimens and genomic data in biobanks may help address what has become a rate-limiting study for genetic research: the need for large sample sizes. The principal roadblock to capitalizing on these resources is the need to establish the validity of phenotypes extracted from the EHR. For psychiatric genetic research, this represents a particular challenge given that diagnosis is based on patient reports and clinician observations that may not be well-captured in billing codes or narrative records. This review addresses the opportunities and pitfalls in EHR-based phenotyping with a focus on their application to psychiatric genetic research. A growing number of studies have demonstrated that diagnostic algorithms with high positive predictive value can be derived from EHRs, especially when structured data are supplemented by text mining approaches. Such algorithms enable semi-automated phenotyping for large-scale case-control studies. In addition, the scale and scope of EHR databases have been used successfully to identify phenotypic subgroups and derive algorithms for longitudinal risk prediction. EHR-based genomics are particularly well-suited to rapid look-up replication of putative risk genes, studies of pleiotropy (phenomewide association studies or PheWAS), investigations of genetic networks and overlap across the phenome, and pharmacogenomic research. EHR phenotyping has been relatively under-utilized in psychiatric genomic research but may become a key component of efforts to advance precision psychiatry.

show abstract

Section: Application: Phenotypic Clusters and Subtypingmentioning

confidence: 99%

The use of electronic health records for psychiatric phenotyping and genomics

Smoller

2017

American J of Med Genetics Pt B

102

View full text Add to dashboard Cite

show abstract

“…3, 4 Statistically derived “computable phenotypes,” comprised of a composite of varying EHR data elements, accurately identify patients of interest in clinical data repositories. 5–12 …”

mentioning

confidence: 99%

A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry

Geva

Gronsbell

Cai

et al. 2017

The Journal of Pediatrics

View full text Add to dashboard Cite

Objectives To compare registry and EHR data mining approaches for cohort ascertainment in patients with pediatric pulmonary hypertension (PH) in an effort to overcome some of the limitations of registry enrollment alone in identifying patients with disease phenotypes. Study design This study was a single-center retrospective analysis of EHR and registry data at Boston Children’s Hospital. The local Informatics for Integrating Biology and the Bedside (i2b2) data warehouse was queried for billing codes, prescriptions, and narrative data related to pediatric PH. Computable phenotype algorithms were developed by fitting penalized logistic regression models to a physician-annotated training set. Algorithms were applied to a candidate patient cohort and performance was evaluated using a separate set of 136 records and 179 registry patients. We compared clinical and demographic characteristics of patients identified by computable phenotype and the registry. Results The computable phenotype had an area under the ROC curve of 90% (95% CI 85% – 95%), positive predictive value of 85% (95% CI 77% – 93%), and identified 413 patients (an additional 231%) with pediatric PH not enrolled in the registry. Patients identified by the computable phenotype were clinically distinct from registry patients, with greater prevalence of diagnoses related to perinatal distress and left heart disease. Conclusions Mining of EHRs using computable phenotypes identified a large cohort of patients not recruited using a classic registry. Fusion of EHR and registry data can improve cohort ascertainment for the study of rare diseases. Trial Registration ClinicalTrials.gov: NCT02249923

show abstract

“…The NLP algorithm had the best PPV (82%), in line with that of EHR algorithms for other neuropsychiatric disorders. 24, 32, 33 In addition, the algorithm detected patients by PCS keywords that were not captured by the coded algorithm. These keywords, however, were of questionable validity.…”

Section: Discussionmentioning

confidence: 99%

Diagnostic algorithms to study post-concussion syndrome using electronic health records: validating a method to capture an important patient population

Dennis

Yengo‐Kahn

Kirby

et al. 2018

Preprint

View full text Add to dashboard Cite

IntroductionPost-concussion syndrome (PCS) is characterized by persistent cognitive, somatic, and emotional symptoms after a mild traumatic brain injury (mTBI). Genetic and other biological variables may contribute to PCS etiology, and the emergence of biobanks linked to electronic health records (EHR) offers new opportunities for research on PCS. We sought to validate the use of EHR data of PCS patients by comparing two diagnostic algorithms.MethodsVanderbilt University Medical Center curates a de-identified database of 2.8 million patient EHR. We developed two EHR-based algorithmic approaches that identified individuals with PCS by: (i) natural language processing (NLP) of narrative text in the EHR combined with structured demographic, diagnostic, and encounter data; or (ii) coded billing and procedure data. The predictive value of each algorithm was assessed, and cases and controls identified by each approach were compared on demographic and medical characteristics.ResultsFirst, the NLP algorithm identified 507 cases and 10,857 controls. The positive predictive value (PPV) in the cases was 82% and the negative predictive value in the controls was 78%. Second, the coded algorithm identified 1,142 patients with two or more PCS billing codes and had a PPV of 76%. Comparisons of PCS controls to both case groups recovered known epidemiology of PCS: cases were more likely than controls to be female and to have pre-morbid diagnoses of anxiety, migraine, and PTSD. In contrast, controls and cases were equally likely to have ADHD and learning disabilities, in accordance with the findings of recent systematic reviews of PCS risk factors.ConclusionsEHR are a valuable research tool for PCS. Ascertainment based on coded data alone had a predictive value comparable to an NLP algorithm, recovered known PCS risk factors, and maximized the number of included patients.

show abstract

Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder

Cited by 67 publications

References 34 publications

The use of electronic health records for psychiatric phenotyping and genomics

The use of electronic health records for psychiatric phenotyping and genomics

A Computable Phenotype Improves Cohort Ascertainment in a Pediatric Pulmonary Hypertension Registry

Diagnostic algorithms to study post-concussion syndrome using electronic health records: validating a method to capture an important patient population

Contact Info

Product

Resources

About