Aims: Identification, a priori, of those at high risk of progression from pre-diabetes to diabetes may enable targeted delivery of interventional programmes while avoiding the burden of prevention and treatment in those at low risk. We studied whether the use of a machine-learning model can improve the prediction of incident diabetes utilizing patient data from electronic medical records. Methods: A machine-learning model predicting the progression from pre-diabetes to diabetes was developed using a gradient boosted trees model. The model was trained on data from The Health Improvement Network (THIN) database cohort, internally validated on THIN data not used for training, and externally validated on the Canadian AppleTree and the Israeli Maccabi Health Services (MHS) data sets. The model's predictive ability was compared with that of a logistic-regression model within each data set. Results: A cohort of 852 454 individuals with pre-diabetes (glucose ≥ 100 mg/dL and/or HbA1c ≥ 5.7) was used for model training including 4.9 million time points using 900 features. The full model was eventually implemented using 69 variables, generated from 11 basic signals. The machine-learning model demonstrated superiority over the logistic-regression model, which was maintained at all sensitivity levelscomparing AUC [95% CI] between the models; in the THIN data set (0.865 [0.860,0.869] vs 0.778 [0.773,0.784] P < .05), the AppleTree data set (0.907 [0.896, 0.919] vs 0.880 [0.867, 0.894] P < .05) and the MHS data set (0.925 [0.923, 0.927] vs 0.876 [0.872, 0.879] P < .05).Conclusions: Machine-learning models preserve their performance across populations in diabetes prediction, and can be integrated into large clinical systems, leading to judicious selection of persons for interventional programmes. K E Y W O R D Selectronic medical records, machine learning, pre-diabetes
Identification a-priori of subjects at high risk of progression from prediabetes to diabetes may enable targeted delivery of interventional programs, while avoiding the burden of prevention and treatment in those at low risk. This study relies on the NHS THIN database cohort of 2,761,222 persons with at least 2 glucose measurements during an average follow-up of 6 years. Prediabetes was diagnosed in 470,107 persons, with 4.8% of them progressing annually to diabetes. We constructed a non-linear model identifying those at high risk of annual progression based on all available patient data and history. The major variables contributing to the model were glucose, HbA1c, BMI, age and gender. Using the clinically acceptable cutoffs of HbA1c≥6.0% and/or glucose≥110mg/dl identified 76.5% of those who actually progressed to diabetes (sensitivity), while labelling 33.2% of the population as high risk (positivity rate). Setting our model at the same sensitivity yielded a lower positivity rate of 22.0%, thereby identifying the same number of progressors while labelling a significantly smaller population as high risk. The predictive ability of our model was superior to simple logistic regression based on glucose, HbA1c, BMI, age and gender as well (Table). In conclusion, our algorithm enables judicious selection of the target population for a clinical intervention, with a higher positive predictive value, thus leading to cost saving. Disclosure A. Cahn: Advisory Panel; Self; AstraZeneca, Novo Nordisk Inc.. Research Support; Self; AstraZeneca. Consultant; Self; GlucoMe. Stock/Shareholder; Self; GlucoMe. Advisory Panel; Self; Eli Lilly and Company. Speaker's Bureau; Self; Novo Nordisk Inc., Eli Lilly and Company, AstraZeneca. Advisory Panel; Self; Sanofi. Speaker's Bureau; Self; Sanofi. Advisory Panel; Self; Boehringer Ingelheim Pharmaceuticals, Inc.. Speaker's Bureau; Self; Boehringer Ingelheim Pharmaceuticals, Inc., Merck Sharp & Dohme Corp.. Consultant; Self; medial early sign. A. Shoshan: Employee; Self; Medial Research. T. Sagiv: Employee; Self; Medial EarlySign. R. Yesharim: None. I. Raz: Advisory Panel; Self; AstraZeneca. Consultant; Self; AstraZeneca. Speaker's Bureau; Self; AstraZeneca. Advisory Panel; Self; Boehringer Ingelheim GmbH. Speaker's Bureau; Self; Boehringer Ingelheim GmbH. Advisory Panel; Self; Eli Lilly and Company. Speaker's Bureau; Self; Eli Lilly and Company. Stock/Shareholder; Self; DarioHealth. Advisory Panel; Self; Merck Sharp & Dohme Corp., Novo Nordisk Inc.. Speaker's Bureau; Self; Novo Nordisk Inc.. Advisory Panel; Self; Orgenesis Inc., Pfizer Inc., Sanofi R&D, SmartZyme Biopharma. Consultant; Self; Bristol-Myers Squibb Company. Speaker's Bureau; Self; Bristol-Myers Squibb Company, Johnson & Johnson Diabetes Institute, LLC., Merck Sharp & Dohme Corp., Novartis Pharma K.K., Sanofi-Aventis. Consultant; Self; FuturRx Ltd, Insuline Medical,Camereyes Ltd, Exscopia, Medial EarlySign Ltd. Stock/Shareholder; Self; Glucome Ltd, InsuLine Medical Ltd.. Consultant; Self; Dermal Biomics Inc. Stock/Shareholder; Self; Orgenesis Inc.. Speaker's Bureau; Self; Teva Pharmaceutical Industries Ltd.. Advisory Panel; Self; Concenter BioPharma/Silkim Ltd, Camereyes Ltd. Stock/Shareholder; Self; CameraEyes Ltd. Advisory Panel; Self; Breath of Life PharmaLtd, Panaxia. R. Goshen: Consultant; Self; Medial EarlySign Ltd..
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.