The estimation of long-term diabetes complications risk is essential in the process of medical decision making. Guidelines for the management of Type 2 Diabetes Mellitus (T2DM) advocate calculating the Cardiovascular Disease (CVD) risk to initiate appropriate treatment. The objective of this study is to investigate the use of sophisticated machine learning techniques toward the development of personalized models able to predict the risk of fatal or nonfatal CVD incidence in T2DM patients. The important challenge of handling the unbalanced nature of the available dataset is addressed by applying novel ensemble strategies. Hybrid Wavelet Neural Networks (HWNNs) and Self-Organizing Maps (SOMs) constitute the primary models for building ensembles following a subsampling approach. Different methods for combining the decisions of the primary models are applied and comparatively assessed. Data from the 5-year follow up of 560 patients with T2DM are used for development and evaluation purposes. The highest discrimination performance (Area Under the Curve (AUC): 71.48%) is achieved by taking into account both the HWNN- and SOM- based primary models' outputs. The proposed method is superior to the Binomial Linear Regression (BLR) model justifying the need to apply more sophisticated techniques in order to produce reliable CVD risk scores.
The aim of the present study is to comparatively assess the performance of different machine learning and statistical techniques with regard to their ability to estimate the risk of developing type 2 diabetes mellitus (Case 1) and cardiovascular disease complications (Case 2). This is the first work investigating the application of ensembles of artificial neural networks (EANN) towards producing the 5‐year risk of developing type 2 diabetes mellitus and cardiovascular disease as a long‐term diabetes complication. The performance of the proposed models has been comparatively assessed with the performance obtained by applying logistic regression, Bayesian‐based approaches, and decision trees. The models' discrimination and calibration have been evaluated using the classification accuracy (ACC), the area under the curve (AUC) criterion, and the Hosmer–Lemeshow goodness of fit test. The obtained results demonstrate the superiority of the proposed models (EANN) over the other models. In Case 1, EANN with different topologies has achieved high discrimination and good calibration performance (ACC = 80.20%, AUC = 0.849, p value = .886). In Case 2, EANN based on bagging has resulted in good discrimination and calibration performance (ACC = 92.86%, AUC = 0.739, p value = .755).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.