Background Carotid plaque can progress into stroke, myocardial infarction, etc, which are major global causes of death. Evidence shows a significant increase in carotid plaque incidence among patients with fatty liver disease. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons, resulting in a large number of patients with undetected carotid plaques, especially among those with fatty liver disease. Objective This study aimed to combine the advantages of machine learning (ML) and logistic regression to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. Methods Our study included 5,420,640 participants with fatty liver from Meinian Health Care Center. We used random forest, elastic net (EN), and extreme gradient boosting ML algorithms to select important features from potential predictors. Features acknowledged by all 3 models were enrolled in logistic regression analysis to develop a carotid plaque prediction model. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation data set, and an external validation data set comprising 32,682 participants from MJ Health Check-up Center. Risk cutoff points for carotid plaque were determined based on the Youden index, predicted probability distribution, and prevalence rate of the internal validation data set to classify participants into high-, intermediate-, and low-risk groups. This risk classification was further validated in the external validation data set. Results Among the participants, 26.23% (1,421,970/5,420,640) were diagnosed with carotid plaque in the development data set, and 21.64% (7074/32,682) were diagnosed in the external validation data set. A total of 6 features, including age, systolic blood pressure, low-density lipoprotein cholesterol (LDL-C), total cholesterol, fasting blood glucose, and hepatic steatosis index (HSI) were collectively selected by all 3 ML models out of 27 predictors. After eliminating the issue of collinearity between features, the logistic regression model established with the 5 independent predictors reached an area under the curve of 0.831 in the internal validation data set and 0.801 in the external validation data set, and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or ML algorithms. Optimal predicted probability cutoff points of 25% and 65% were determined for classifying individuals into low-, intermediate-, and high-risk categories for carotid plaque. Conclusions The combination of ML and logistic regression yielded a practical carotid plaque prediction model, and was of great public health implications in the early identification and risk assessment of carotid plaque among individuals with fatty liver.
BACKGROUND Carotid plaque can progress into stroke and myocardial infarction, etc., which are the leading causes of death globally. Evidence demonstrates that in patients with fatty liver disease, the incidence of carotid plaque increased significantly. However, unlike the high detection rate of fatty liver disease, screening for carotid plaque in the asymptomatic population is not yet prevalent due to cost-effectiveness reasons. OBJECTIVE This study aimed to combine the advantages of machine learning and logistic regression, to develop a straightforward prediction model among the population with fatty liver disease to identify individuals at risk of carotid plaque. METHODS 5,420,640 participants with fatty liver from Meinian Healthcare Center were included in our study. Three machine learning algorithms, including random forest, elastic net, and XGBoost were used to select important features from potential predictors, and features acknowledged by all three models were enrolled in logistic regression analysis to develop a carotid plaque prediction model among the population with fatty liver. Model performance was evaluated based on the area under the receiver operating characteristic curve, calibration curve, Brier score, and decision curve analysis both in a randomly split internal validation dataset, and an external validation dataset from MJ Health Check-up Center. The risk cutoff points for carotid plaque were determined based on a large sample size of the development dataset for risk assessment and verified on the external validation dataset. RESULTS Among the participants, 1,421,970 (26.23%) were diagnosed with carotid plaque. A total of five features, including age, systolic blood pressure, low-density lipoprotein cholesterol, total cholesterol, fasting blood glucose, and hepatic steatosis index were collectively selected by all three machine learning models out of 27 predictors. The logistic regression model established with the five predictors reached an area under the curve (AUC) of 0.831 in the internal validation dataset and 0.801 in the external validation dataset and showed good calibration capability graphically. Its predictive performance was comprehensively competitive compared with the single use of either logistic regression or machine learning algorithms. 25% and 65% were determined to be the predictive probability cutoff points of low risk, intermediate risk, and high risk of carotid plaque. CONCLUSIONS The combination of machine learning and logistic regression outperformed the single use of any of them in establishing a straightforward and practical carotid plaque prediction model, and was of great value in the early identification and risk assessment of carotid plaque in population with fatty liver.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.