<abstract> <p>The crucial problem when applying classification algorithms is unequal classes. An imbalanced dataset problem means, particularly in a two-class dataset, that the group variable of one class is comparatively more dominant than the group variable of the other class. The issue stems from the fact that the majority class dominates the minority class. The synthetic minority over-sampling technique (SMOTE) has been developed to deal with the classification of imbalanced datasets. SMOTE algorithm increases the number of samples by interpolating between the clustered minority samples. The SMOTE algorithm has three critical parameters, "k", "perc.over", and "perc.under". "perc.over" and "perc.under" hyperparameters allow determining the minority and majority class ratios. The "k" parameter is the number of nearest neighbors used to create new minority class instances. Finding the best parameter value in the SMOTE algorithm is complicated. A hybridized version of genetic algorithm (GA) and support vector machine (SVM) approaches was suggested to address this issue for selecting SMOTE algorithm parameters. Three scenarios were created. Scenario 1 shows the evaluation of support vector machine SVM) results without using the SMOTE algorithm. Scenario 2 shows that the SVM was used after applying SMOTE algorithm without the GA algorithm. In the third scenario, the results were analyzed using the SVM algorithm after selecting the SMOTE algorithm's optimization method. This study used two imbalanced datasets, drug use and simulation data. After, the results were compared with model performance metrics. When the model performance metrics results are examined, the results of the third scenario reach the highest performance. As a result of this study, it has been shown that a genetic algorithm can optimize class ratios and k hyperparameters to improve the performance of the SMOTE algorithm.</p> </abstract>
The traumatic traces of suicide in a society and the emotional devastation due to these losses make it very important to determine the causes of suicide. In this study, the number of suicides data was used for Turkey's 81 provinces in 2019.The effects of factors affecting suicide and spatial differences on suicide were analyzed and predicted with geographically weighted regression models (GWR). GWR models were applied with different kernel functions, and the best GWR model was found with the bisquare kernel function. Factors affecting suicide numbers were established as human development index, proportion of internet users, and numbers of unemployment. When the results were examined, it was seen that the number of suicides in the provinces was affected by different factors. In addition, the 2019 suicide numbers and predicted values were mapped, and the results were found to be quite similar. The province with the highest number of suicides across the country was Istanbul.
Education is the foundation of economic, social, and cultural development for every individual and society as a whole. Students are accepted to secondary education institutions with the high school entrance examination made by the Ministry of National Education in Turkey. In this study, the success rates of the students who took the high school entrance examination in Turkey's 81 provinces in 2019 were handled with the machine learning regression and beta regression model. The present paper aimed to model, predict, and explain students' success rates using variables such as divorce rate, gross domestic product, illiteracy, and higher education populations. Support vector regression, random forest, decision tree, and beta regression model were applied to estimate success rates. Two models with the highest R 2 value were found to be beta regression and random forest models. When the prediction errors of beta regression and random forest model were examined, it seemed to be that the random forest model is relatively superior to the beta regression model in predicting the success rates. While the beta regression model was the best predictor of the success rates of Çanakkale province, the random forest model predicted the success rates of Ankara well. Also, it was seen that the variables found to be significant in the beta regression model for success rates were also crucial in the random forest model. It is recommended to use both the beta and random forest models to estimate the students' success rates.
Machine learning is a field of artificial intelligence that allows computers to predict and model future events by making inferences from past information with mathematical and statistical operations. In this study, we used tree-based regression models, one of the machine learning methods, to determine and predict the effect of health indicators of 191 countries on the human development index (HDI) between 2014 and 2018 years. When tree-based regression models were compared according to model performance criteria, it was found that the best model was the gradient boosting model with the highest R 2 = 0.9962 and the smallest RMSE = 0.0094. With the gradient boosting model, the three most important variables to HDI are; current health expenditure per capita, physicians and nurses, and midwives, respectively. By selecting the ten countries with the highest HDI values and Turkey, HDI values were estimated for 2018-2019 with a gradient boosting model. The countries for which HDI values are best predicted by the gradient boosting method are Netherlands,
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.