Background: Although machine learning (ML)-based prediction of coronary artery disease (CAD) has gained increasing attention, assessment of the severity of suspected CAD in symptomatic patients remains challenging. Methods: The training set for this study consisted of 284 retrospective participants, while the test set included 116 prospectively enrolled participants from whom we collected 53 baseline variables and coronary angiography results. The data was pre-processed with outlier processing and One-Hot coding. In the first stage, we constructed a ML model that used baseline information to predict the presence of CAD with a dichotomous model. In the second stage, baseline information was used to construct ML regression models for predicting the severity of CAD. The non-CAD population was included, and two different scores were used as output variables. Finally, statistical analysis and SHAP plot visualization methods were employed to explore the relationship between baseline information and CAD. Results: The study included 269 CAD patients and 131 healthy controls. The eXtreme Gradient Boosting (XGBoost) model exhibited the best performance amongst the different models for predicting CAD, with an area under the receiver operating characteristic curve of 0.728 (95% CI 0.623–0.824). The main correlates were left ventricular ejection fraction, homocysteine, and hemoglobin ( p 0.001). The XGBoost model performed best for predicting the SYNTAX score, with the main correlates being brain natriuretic peptide (BNP), left ventricular ejection fraction, and glycated hemoglobin ( p 0.001). The main relevant features in the model predictive for the GENSINI score were BNP, high density lipoprotein, and homocysteine ( p 0.001). Conclusions: This data-driven approach provides a foundation for the risk stratification and severity assessment of CAD. Clinical Trial Registration: The study was registered in www.clinicaltrials.gov protocol registration system (number NCT05018715).
ObjectiveTo investigate the association between red cell distribution width (RDW) and the RDW to platelet count ratio (RPR) and cardiovascular diseases (CVDs) and to further investigate whether the association involves population differences and dose–response relationships.DesignCross-sectional population-based study.SettingThe National Health and Nutrition Examination Survey (1999–2020).ParticipantsA total of 48 283 participants aged 20 years or older (CVD, n=4593; non-CVD, n=43 690) were included in this study.Primary and secondary outcome measuresThe primary outcome was the presence of CVD, while the secondary outcome was the presence of specific CVDs. Multivariable logistic regression analysis was performed to determine the relationship between RDW or the RPR and CVD. Subgroup analyses were performed to test the interactions between demographics variables and their associations with disease prevalence.ResultsA logistic regression model was fully adjusted for potential confounders; the ORs with 95% CIs for CVD across the second to fourth quartiles were 1.03 (0.91 to 1.18), 1.19 (1.04 to 1.37) and 1.49 (1.29 to 1.72) for RDW (p for trend <0.0001) compared with the lowest quartile. The ORs with 95% CIs for CVD across the second to fourth quartiles were 1.04 (0.92 to 1.17), 1.22 (1.05 to 1.42) and 1.64 (1.43 to 1.87) for the RPR compared with the lowest quartile (p for trend <0.0001). The association of RDW with CVD prevalence was more pronounced in females and smokers (all p for interaction <0.05). The association of the RPR with CVD prevalence was more pronounced in the group younger than 60 years (p for interaction=0.022). The restricted cubic spline also suggested a linear association between RDW and CVD and a non-linear association between the RPR and CVD (p for non-linear <0.05).ConclusionThere are statistical heterogeneities in the association between RWD, RPR distributions and the CVD prevalence, across sex, smoking status and age groups.
IntroductionOur aim was to use the constructed machine learning (ML) models as auxiliary diagnostic tools to improve the diagnostic accuracy of non-ST-elevation myocardial infarction (NSTEMI).Materials and methodsA total of 2878 patients were included in this retrospective study, including 1409 patients with NSTEMI and 1469 patients with unstable angina pectoris. The clinical and biochemical characteristics of the patients were used to construct the initial attribute set. SelectKBest algorithm was used to determine the most important features. A feature engineering method was applied to create new features correlated strongly to train ML models and obtain promising results. Based on the experimental dataset, the ML models of extreme gradient boosting, support vector machine, random forest, naïve Bayesian, gradient boosting machines and logistic regression were constructed. Each model was verified by test set data, and the diagnostic performance of each model was comprehensively evaluated.ResultsThe six ML models based on the training set all play an auxiliary role in the diagnosis of NSTEMI. Although all models taken for comparison performed differences, the extreme gradient boosting ML model performed the best in terms of accuracy rate (0.95±0.014), precision rate (0.94±0.011), recall rate (0.98±0.003) and F-1 score (0.96±0.007) in NSTEMI.ConclusionsThe ML model constructed based on clinical data can be used as an auxiliary tool to improve the accuracy of NSTEMI diagnosis. According to our comprehensive evaluation, the performance of the extreme gradient boosting model was the best.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.