IntroductionPost-concussion syndrome (PCS) is characterized by persistent cognitive, somatic, and emotional symptoms after a mild traumatic brain injury (mTBI). Genetic and other biological variables may contribute to PCS etiology, and the emergence of biobanks linked to electronic health records (EHR) offers new opportunities for research on PCS. We sought to validate the use of EHR data of PCS patients by comparing two diagnostic algorithms.MethodsVanderbilt University Medical Center curates a de-identified database of 2.8 million patient EHR. We developed two EHR-based algorithmic approaches that identified individuals with PCS by: (i) natural language processing (NLP) of narrative text in the EHR combined with structured demographic, diagnostic, and encounter data; or (ii) coded billing and procedure data. The predictive value of each algorithm was assessed, and cases and controls identified by each approach were compared on demographic and medical characteristics.ResultsFirst, the NLP algorithm identified 507 cases and 10,857 controls. The positive predictive value (PPV) in the cases was 82% and the negative predictive value in the controls was 78%. Second, the coded algorithm identified 1,142 patients with two or more PCS billing codes and had a PPV of 76%. Comparisons of PCS controls to both case groups recovered known epidemiology of PCS: cases were more likely than controls to be female and to have pre-morbid diagnoses of anxiety, migraine, and PTSD. In contrast, controls and cases were equally likely to have ADHD and learning disabilities, in accordance with the findings of recent systematic reviews of PCS risk factors.ConclusionsEHR are a valuable research tool for PCS. Ascertainment based on coded data alone had a predictive value comparable to an NLP algorithm, recovered known PCS risk factors, and maximized the number of included patients.