a b s t r a c tWe developed a prediction model for cyanobacterial blooms in the lower Han River, South Korea, using decision tree algorithms. Decision tree is a type of machine learning method that can overcome missing values or outlier problems. Despite its simple application, it can accurately predict complex natural phenomena. To improve the robustness of the model, we used ensemble methods such as Bagging, AdaBoost, and Random Forest, and the performance of each method was compared against that of a single decision tree. The indicators of cyanobacterial blooms, namely chlorophyll-a concentration and cyanobacteria cell count, were classified into either the non-exceedance or the exceedance class according to administrative guidelines or criteria, and used as the response variables. Since the cyanobacteria cell count in the exceedance class was much smaller than that in the non-exceedance class, the synthetic minority over-sampling technique (SMOTE) was used to mitigate the imbalance between classes. The prediction abilities for chlorophyll-a and cyanobacteria were evaluated based on multiple indices, including area under curve (AUC). The result showed that the performance of ensemble models improved by 1.7%-11.1% and 1.5%-4.9% compared with that of the single model for chlorophyll-a and cyanobacteria, respectively. The implementation of SMOTE to mitigate the imbalance cyanobacteria cell count data enhanced AUC by 4.3%-6.7%. The results of the variable importance analysis indicated that water temperature, flow, and month were essential factors for the prediction of the cyanobacteria classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.