Investigations using Electronic Health Records (EHR) databases could enable accurate delineation of psychiatric disease trajectories at an unprecedented scale. Using EHR from a single institution (Clinica San Juan de Dios in Manizales, Colombia), we characterize diagnostic trajectories of >22,000 (ages 4-90, 60% female) individuals treated for severe mental illness (SMI), including schizophrenia (SCZ), bipolar disorder (BD), and severe or recurrent major depressive disorder (MDD).
We extracted diagnostic codes, clinical notes, and healthcare use data collected since 2005. Using a subsample of 105 SMI patients, we assessed diagnostic reliability, comparing EHR to clinical chart review. EHR diagnostic codes showed very good agreement with chart review diagnoses (Cohens kappa 0.78). Using 3,600 annotated sentences from 2,788 patients, we developed a pipeline for extracting clinical features from the electronic text, which showed high agreement with gold-standard annotations (average F1 0.88). Factors associated with diagnostic instability, defined as changes in diagnosis between successive visits, were identified using mixed-effect logistic regression models.
Of SMI patients with >3 visits (n=12,962), 64% had multiple EHR diagnoses; diagnostic switches (19%), comorbidities (30%), and both (15%). While some diagnostic switches are common, such as the switch from MDD to BD (observed in 22% of BD patients), trajectories are highly heterogeneous, with rare trajectories (occurring in <1% of patients) making up the majority (58% of all patients). Predictors of diagnostic instability include time since initial visit (OR 0.56 by visit number, p-value 2e-66), previous diagnostic change (OR= 4.02, p-value 3e-250) and NLP-derived descriptions of delusions (OR 1.50, p-values 2e-18).
Our results underline the importance of considering longitudinal rather than cross-sectional diagnoses in psychiatric research and show how high-quality EHR data can contribute to global efforts to understand disease trajectories.