BackgroundDiabetes is a serious and progressive medical condition demanding efficient diagnostic methods, especially since its associated symptoms overlap with the symptoms of other medical conditions. While various studies have explored early detection of diabetes across different age groups, there is a notable gap in specific attention to middle-aged adults. This study explicitly focused on this demographic, aiming to assess associations between symptoms and diabetes status, investigate the relevance and relative influence of certain symptomatic and demographic features in the prediction of diabetes, and identify the most efficient machine learning (ML) model for predicting diabetes.MethodsUtilizing a dataset from a previous study conducted in the Sylhet Diabetes Hospital in Bangladesh, India, comprising 520 participants, including both diabetic and non-diabetic patients, we extracted and analyzed demographic and symptom-related information from 296 middle-aged adults aged from 40 to 60 years. Employing chi-square tests, we evaluated symptom-diabetes associations, while utilizing the Boruta algorithm to investigate symptom importance and influence. Seven ML models namely, K-Nearest Neighbor (KNN), Naïve Bayes (NB) classifier, Support Vector Machines with linear, polynomial, and radial basis function kernels, Random Forest (RF) classifier, and Logistic Regression were then assessed for optimal predictive performance.ResultsOut of the 296 participants of this study, 179 (60%) were diabetic. Significant associations were found between diabetes status in middle-aged adults and symptoms such as polyuria, polydipsia, weakness, sudden weight loss, partial paresis, polyphagia, and visual blurring, as confirmed by the p-values of their respective chi-square tests. All features studied, including demographics and symptoms, were confirmed as relevant for predicting diabetes in middle-aged adults. Notably, polyuria, polydipsia, gender, alopecia, irritability, and sudden weight loss were identified as the most influential features. Among the seven ML models, RF showed the highest sensitivity (98.59%), while KNN excelled in specificity (97.83%). RF demonstrated the best accuracy (96.58%) and area under the curve score (96.00%), making it the most efficient ML model for predicting diabetes among middle-aged adults.ConclusionThe findings of this study emphasize the importance of using diabetes-related symptoms for early detection of diabetes within the middle-aged adult population. The RF model demonstrated robust diagnostic capabilities, emphasizing its potential in predicting diabetes in middle-aged adults. Further exploration of genetic, lifestyle, and environmental factors is warranted to enhance the understanding and diagnostic accuracy in this demographic.