Water is one of the most important resources for human life and health. Global climate change, industrialization and urbanization pose serious dangers to existing water resources. Water quality has traditionally been predicted by expensive, time-consuming laboratory and statistical analysis. However, machine learning algorithms can be applied to determine the water quality index in real time efficiently and quickly. With this motivation, a dataset obtained from the Kaggle website was used to classify water quality in this research. Some features were found to be empty in the data set. Traditional methods (drop, mean imputation) and regression method were applied for null values. After the null values were completed, RF, Adaboost and XGBoost were applied for binary classification. Gridsearch and Randomsearch methods have been applied in hyper parameter optimization. Among all the algorithms used, the SXH hybrid method created with the Support Vector Regression (SVR) and XGBoost methods showed the best classification performance with 99.4% accuracy and F1-score. Comparison of our results with previous similar studies showed that our SVR XGboost Hybrid (SXH) model had the best performance ratio (Accuracy, F1-score). The performance of our proposed model is proof that hybrid machine learning methods can provide an innovative perspective on potable water quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.