Although artificial intelligence (AI) such as machine learning (ML) and deep learning (DL) has been recognized as an emerging and promising tool, its application becomes challenging with incomplete data collection. Herein, in the absence of the influent phosphorus load and chemical dosage data for phosphorus removal, we employed ML/DL models to predict effluent phosphorus using nine-year data from a small-scale wastewater treatment plant. Attempts were made to select essential model input features from 42 variables by using Pearson correlation analysis to reveal internal correlations among variables. First, five ML regression models were used to predict the effluent phosphorus load, and a maximum coefficient of determination (R 2) of 0.637 was achieved with the support vector machine model. Then, the DL model named long short-term memory could predict phosphorus load in one-day advance with an R 2 value of 0.496. Finally, on the basis of the historical data, an anomaly alarm design was proposed to minimize the chance of exceeding the discharge permit and achieved a maximum accuracy of 79.7% to predict the phosphorus concentration after comparing seven ML classification models. This study provides an example of applying AI for process improvement and potential cost reduction with incomplete data sets.
Anaerobic digestion (AD) of sludge is a key approach to recover useful bioenergy from wastewater treatment and its stable operation is important to a wastewater treatment plant (WWTP). Because of various biochemical processes that are not fully understood, AD operation can be affected by many parameters and thus modeling AD processes becomes a useful tool for monitoring and controlling their operation. In this case study, a robust AD model for predicting biogas production was developed using ensembled machine learning (ML) model based on the data from a full-scale WWTP. Eight ML models were examined for predicting biogas production and three of them were selected as metamodels to create a voting model. This voting model had a coefficient of determination (R 2 ) at 0.778 and a root mean square error (RMSE) of 0.306, outperformed individual ML models. The Shapley additive explanation (SHAP) analysis revealed that returning activated sludge and temperature of wastewater influent were important features, although they affected biogas production in different ways. The results of this study have demonstrated the feasibility of using ML models for predicting biogas production in the absence of high-quality data input and improving model prediction through assembling a voting model. Practitioner Points• Machine learning is applied to model biogas production from anaerobic digesters at a full-scale wastewater treatment plant.• A voting model is created from selected individual models and exhibits better performance of predication.• In the absence of high quality data, indirect features are identified to be important to predicting biogas production. K E Y W O R D Sanaerobic digestion, ensembled model, sensitivity analysis, wastewater treatment plant Both Jianpeng Zhou and Zhen He are WEF members.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.