Motivation
Machine learning (ML)-based stroke risk stratification systems have typically focused on conventional risk factors (CRF) (
AtheroRisk-conventional
). Besides CRF, carotid ultrasound image phenotypes (CUSIP) have shown to be powerful phenotypes risk stratification. This is the first ML study of its kind that integrates CUSIP and CRF for risk stratification (
AtheroRisk-integrated
) and compares against
AtheroRisk-conventional
.
Methods
Two types of ML-based setups called (i)
AtheroRisk-integrated
and (ii)
AtheroRisk-conventional
were developed using random forest (RF) classifiers.
AtheroRisk-conventional
uses a feature set of 13 CRF such as age, gender, hemoglobin A1c, fasting blood sugar, low-density lipoprotein, and high-density lipoprotein (HDL) cholesterol, total cholesterol (TC), a ratio of TC and HDL, hypertension, smoking, family history, triglyceride, and ultrasound-based carotid plaque score.
AtheroRisk-integrated
system uses the feature set of 38 features with a combination of 13 CRF and 25 CUSIP features (6 types of current CUSIP, 6 types of 10-year CUSIP, 12 types of quadratic CUSIP (harmonics), and age-adjusted grayscale median). Logistic regression approach was used to select the significant features on which the RF classifier was trained. The performance of both ML systems was evaluated by area-under-the-curve (AUC) statistics computed using a leave-one-out cross-validation protocol.
Results
Left and right common carotid arteries of 202 Japanese patients were retrospectively examined to obtain 404 ultrasound scans. RF classifier showed higher improvement in AUC (~
57%
) for leave-one-out cross-validation protocol. Using RF classifier, AUC statistics for
AtheroRisk-integrated
system was higher (AUC =
0.99
,p-value<0.001) compared to
AtheroRisk-conventional
(AUC =
0.63
,p-value<0.001).
Conclusion
The
AtheroRisk-integrated
ML system outperforms the
AtheroRisk-conventional
ML system using RF classifier.