Development and Multi-Site External Validation of a Generalizable Risk Prediction Model for Bipolar Disorder

Walsh, Colin G.; Ripperger, Michael; Hu, Yirui; Sheu, Yi-han; Wilimitis, Drew; Zheutlin, Amanda B; Rocha, Daniel; Choi, Karmel W.; Castro, Víctor M.; Kirchner, H. Lester; Chabris, Christopher F.; Davis, Lea K.; Smoller, Jordan W.

doi:10.1101/2023.02.21.23286251

Cited by 1 publication

(1 citation statement)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The availability of large-scale, real-world healthcare data and progress in machine learning has provided an opportunity to develop accurate, scalable tools for risk stratification and screening. A recent study used such methods to develop EHR-based algorithms for the prediction of BD but those analyses were restricted to adult patients 56 . In this study, we developed various machine learning models to identify youth at risk of BD for three clinical use cases: a general cohort of all youth in a health system, youth with a history of mental healthcare, and youth with prior diagnosis of a (non-BD) mood disorder or ADHD.…”

Section: Discussionmentioning

confidence: 99%

Machine Learning Models for the Prediction of Early-Onset Bipolar Using Electronic Health Records

Wang,

Sheu,

Lee

et al. 2024

Preprint

View full text Add to dashboard Cite

ObjectiveEarly identification of bipolar disorder (BD) provides an important opportunity for timely intervention. In this study, we aimed to develop machine learning models using large-scale electronic health record (EHR) data including clinical notes for predicting early-onset BD.MethodStructured and unstructured data were extracted from the longitudinal EHR of the Mass General Brigham health system. We defined three cohorts aged 10 – 25 years: (1) the full youth cohort (N=300,398); (2) a sub-cohort defined by having a mental health visit (N=105,461); (3) a sub-cohort defined by having a diagnosis of mood disorder or ADHD (N=35,213). By adopting a prospective landmark modeling approach that aligns with clinical practice, we developed and validated a range of machine learning models including neural network-based models, across different cohorts and prediction windows.ResultsWe found the two tree-based models, Random forests (RF) and light gradient-boosting machine (LGBM), achieving good discriminative performance across different clinical settings (area under the receiver operating characteristic curve 0.76-0.88 for RF and 0.74-0.89 for LGBM). In addition, we showed comparable performance can be achieved with a greatly reduced set of features, demonstrating computational efficiency can be attained without significant compromise of model accuracy.ConclusionGood discriminative performance for early-onset BD is achieved utilizing large-scale EHR data. Our study offers a scalable and accurate method for identifying youth at risk for BD that could help inform clinical decision making and facilitate early intervention. Future work includes evaluating the portability of our approach to other healthcare systems and exploring considerations regarding possible implementation.

show abstract

Section: Discussionmentioning

confidence: 99%