Diarrhea is an endemic disease in Indonesia with symptoms of three or more defecations with the consistency of liquid stool. According to WHO, diarrhea is the second largest contributor to the death of under-five children. Data and cases of children under five years who have diarrhea are very difficult to find, so the data analysis process becomes difficult due to the lack of information obtained. Difficulties in the data analysis process can be overcome by rebalancing, so the category ratios are balanced. The method that is popularly used is SMOTE. To solve imbalanced data and improve classification performance, this study implements the combination of SMOTE with several ensemble techniques in diarrhea cases of under-five children in Indonesia. Ensemble models that are used in this study are Random Forest, Adaptive Boosting, and XGBoost with Decision Tree as a baseline method. The results show that all SMOTE-based methods demonstrate a competitive performance whereas SMOTE-XGB gains a slightly higher accuracy (0.88), precision (0.96), and f1-score (0.86). The implementation of the SMOTE strategy improved the recall, precision, and f1-score metrics and give higher AUC of all methods (DT, RF, ADA, and XGB). This study is useful to solve the imbalanced problems in official statistics data provided by BPS Statistics Indonesia
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.