INTRODUCTIONIdentifying mild cognitive impairment (MCI) patients at risk for dementia could facilitate early interventions. Using electronic health records (EHRs), we developed a model to predict MCI to all‐cause dementia (ACD) conversion at 5 years.METHODSCox proportional hazards model was used to identify predictors of ACD conversion from EHR data in veterans with MCI. Model performance (area under the receiver operating characteristic curve [AUC] and Brier score) was evaluated on a held‐out data subset.RESULTSOf 59,782 MCI patients, 15,420 (25.8%) converted to ACD. The model had good discriminative performance (AUC 0.73 [95% confidence interval (CI) 0.72–0.74]), and calibration (Brier score 0.18 [95% CI 0.17–0.18]). Age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors, while body mass index, alcohol abuse, and sleep apnea were protective factors.DISCUSSIONEHR‐based prediction model had good performance in identifying 5‐year MCI to ACD conversion and has potential to assist triaging of at‐risk patients.Highlights
Of 59,782 veterans with mild cognitive impairment (MCI), 15,420 (25.8%) converted to all‐cause dementia within 5 years.
Electronic health record prediction models demonstrated good performance (area under the receiver operating characteristic curve 0.73; Brier 0.18).
Age and vascular‐related morbidities were predictors of dementia conversion.
Synthetic data was comparable to real data in modeling MCI to dementia conversion.
Key Points
An electronic health record–based model using demographic and co‐morbidity data had good performance in identifying veterans who convert from mild cognitive impairment (MCI) to all‐cause dementia (ACD) within 5 years.
Increased age, stroke, cerebrovascular disease, myocardial infarction, hypertension, and diabetes were risk factors for 5‐year conversion from MCI to ACD.
High body mass index, alcohol abuse, and sleep apnea were protective factors for 5‐year conversion from MCI to ACD.
Models using synthetic data, analogs of real patient data that retain the distribution, density, and covariance between variables of real patient data but are not attributable to any specific patient, performed just as well as models using real patient data. This could have significant implications in facilitating widely distributed computing of health‐care data with minimized patient privacy concern that could accelerate scientific discoveries.