Background
Machine learning (ML) risk prediction models for post-stroke cognitive impairment (PSCI) are still far from optimal. This study aims to generate a reliable predictive model for predicting PSCI in Chinese individuals using ML algorithms.
Methods
We collected data on 494 individuals who were diagnosed with acute ischemic stroke (AIS) and hospitalized for this condition from January 2022 to November 2023 at a Chinese medical institution. All of the observed samples were divided into a training set (70%) and a validation set (30%) at random. Logistic regression combined with the least absolute shrinkage and selection operator (LASSO) regression was utilized to efficiently screen the optimal predictive features of PSCI. We utilized seven different ML models (LR, XGBoost, LightGBM, AdaBoost, GNB, MLP, and SVM) and compared their performance for the resulting variables. We used five-fold cross-validation to measure the model's area under the curve (AUC), sensitivity, specificity, accuracy, F1 score and PR values. SHAP analysis provides a comprehensive and detailed explanation of our optimized model's performance.
Results
PSCI was identified in 58.50% of the 494 eligible AIS patients. The most predictive features of PSCI are HAMD-24, FBG, age, PSQI, and paraventricular lesion. The XGBoost model, among the 7 ML prediction models for PSCI developed based on the best predictive features, demonstrates superior performance, as indicated by its AUC (0.961), sensitivity (0.931), specificity (0.889), accuracy (0.911), F1 score (0.926), and AP value (0.967).
Conclusion
The XGBoost model developed on HAMD-24, FBG, age, PSQI, and paraventricular lesion performance is exceptional in predicting the risk of PSCI. It provide clinicians with a reliable tool for early screening of patients with cognitive impairment and effective treatment decisions in stroke patients.