Smoking is one of the major public health issues, which has a significant impact on premature death. In recent years, numerous decision support systems have been developed to deal with smoking cessation based on machine learning methods. However, the inevitable class imbalance is considered a major challenge in deploying such systems. In this paper, we study an empirical comparison of machine learning techniques to deal with the class imbalance problem in the prediction of smoking cessation intervention among the Korean population. For the class imbalance problem, the objective of this paper is to improve the prediction performance based on the utilization of synthetic oversampling techniques, which we called the synthetic minority over-sampling technique (SMOTE) and an adaptive synthetic (ADASYN). This has been achieved by the experimental design, which comprises three components. First, the selection of the best representative features is performed in two phases: the lasso method and multicollinearity analysis. Second, generate the newly balanced data utilizing SMOTE and ADASYN technique. Third, machine learning classifiers are applied to construct the prediction models among all subjects and each gender. In order to justify the effectiveness of the prediction models, the f-score, type I error, type II error, balanced accuracy and geometric mean indices are used. Comprehensive analysis demonstrates that Gradient Boosting Trees (GBT), Random Forest (RF) and multilayer perceptron neural network (MLP) classifiers achieved the best performances in all subjects and each gender when SMOTE and ADASYN were utilized. The SMOTE with GBT and RF models also provide feature importance scores that enhance the interpretability of the decision-support system. In addition, it is proven that the presented synthetic oversampling techniques with machine learning models outperformed baseline models in smoking cessation prediction.
Hypertension is a serious medical condition that significantly increases the risk of chronic diseases. Early detection of individuals at risk for hypertension allows to prevent and delay the incidence of related diseases and strokes. In recent years, numerous researches have been focused on the decision support system for predicting hypertension. However, the class imbalance has commonly occurred problem in real-world applications. In this paper, we present the end-to-end costsensitive neural network (COST-NN) framework incorporated with a weighted random forest-based feature selection technique to predict hypertension among Korean adults. First, it identifies the best representative features using a weighted random forest-based feature selection technique. Then, we apply the COST-NN for predicting target among hypertensive and non-hypertensive individuals. In order to identify the accurate predictive model, we compare the various baseline models. Experimental results showed that COST-NN outperforms the regular state-of-theart baseline models. In addition, the presented framework is expected to apply not only hypertension but also can support to prevent the patients from the risk of various diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.