Aims/hypothesis
Research using data-driven cluster analysis has proposed five novel subgroups of diabetes based on six measured variables in individuals with newly diagnosed diabetes. Our aim was (1) to validate the existence of differing clusters within type 2 diabetes, and (2) to compare the cluster method with an alternative strategy based on traditional methods to predict diabetes outcomes.
Methods
We used data from the Swedish National Diabetes Register and included 114,231 individuals with newly diagnosed type 2 diabetes. k-means clustering was used to identify clusters based on nine continuous variables (age at diagnosis, HbA1c, BMI, systolic and diastolic BP, LDL- and HDL-cholesterol, triacylglycerol and eGFR). The elbow method was used to determine the optimal number of clusters and Cox regression models were used to evaluate mortality risk and risk of CVD events. The prediction models were compared using concordance statistics.
Results
The elbow plot, with values of k ranging from 1 to 10, showed a smooth curve without any clear cut-off points, making the optimal value of k unclear. The appearance of the plot was very similar to the elbow plot made from a simulated dataset consisting only of one cluster. In prediction models for mortality, concordance was 0.63 (95% CI 0.63, 0.64) for two clusters, 0.66 (95% CI 0.65, 0.66) for four clusters, 0.77 (95% CI 0.76, 0.77) for the ordinary Cox model and 0.78 (95% CI 0.77, 0.78) for the Cox model with smoothing splines. In prediction models for CVD events, the concordance was 0.64 (95% CI 0.63, 0.65) for two clusters, 0.66 (95% CI 0.65, 0.67) for four clusters, 0.77 (95% CI 0.77, 0.78) for the ordinary Cox model and 0.78 (95% CI 0.77, 0.78) for the Cox model with splines for all variables.
Conclusions/interpretation
This nationwide observational study found no evidence supporting the existence of a specific number of distinct clusters within type 2 diabetes. The results from this study suggest that a prediction model approach using simple clinical features to predict risk of diabetes complications would be more useful than a cluster sub-stratification.
Graphical abstract