BACKGROUND
Early diagnosis of diabetes is essential for early interventions to slow the progression of dysglycemia and its comorbidities. Diabetes screening that relies on blood testing are not widely followed, especially among high-risk groups.
OBJECTIVE
This study aims to investigate the potential use of automated machine learning (AutoML) models and self-reported data in detecting undiagnosed diabetes among U.S. adults.
METHODS
Individual-level data, including biochemical tests for diabetes, demographic characteristics, family history of diabetes, anthropometric measures, dietary intakes, health behaviors, and chronic conditions were retrieved from the National Health and Nutrition Examination Survey, 1999-2020. Undiagnosed diabetes was defined as having no prior self-reported diagnosis but meeting diagnostic criteria for elevated hemoglobin A1c, fasting plasma glucose, or 2-h plasma glucose. The H2O AutoML framework was used to automate the machine learning workflow. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and accuracy.
RESULTS
The study included 11,815 participants, comprising 2,256 patients with undiagnosed diabetes and 9,559 without diabetes. The average ages were 59.76 years for those with undiagnosed diabetes and 46.78 years for those without. The trained AutoML model achieved an AUC of 0.909 and an accuracy of 86.5% in the test set. The model demonstrated a sensitivity of 70.26%, specificity of 90.46%, positive predictive value of 64.10%, and negative predictive value of 92.61% for identifying undiagnosed diabetes from non-diabetes.
CONCLUSIONS
This study is the first to utilize the AutoML model for detecting undiagnosed diabetes in U.S. adults. The model’s high accuracy and applicability to the broader U.S. population make it a promising tool for large-scale diabetes screening efforts.