The following article presents an analysis of the determinants of diabetes using a dataset containing the surveys of 2000 patients from the Frankfurt Hospital in Germany. The data were analyzed using the following models, namely: Tobit, Probit, Logit, Multinomial Logit, OLS, WLS with heteroskedasticity. The results show that the presence of diabetes is positively associated with "Pregnancies", "Glucose", "BMI", "Diabetes Pedigree Function", "Age" and negatively associated with "Blood Pressure". A cluster analysis is realized using the fuzzy c-Means algorithm optimized with the Elbow method and three clusters were found. Finally a confrontation among eight different machine learning algorithms is realized to select the best performing algorithm to predict the probability of patients to develop diabetes.