BackgroundPrediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes.MethodsIn this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification.ResultsThe results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke.ConclusionsOur study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.