“…In particular, some small-scale WWTPs lack the use of real-time sensors and do not monitor all of the key parameters such as the influent phosphorus concentration, making it challenging to establish ML/DL models . Therefore, applying ML/DL algorithms can be difficult in small-scale WWTPs via “big data”, which are generally characterized by five dimensions: volume (quantity and amount of data), velocity (speed of data generation), variety (type, nature, and format of data), veracity (trustworthiness and quality of captured data), and value (insights and impact). , For example, the ML models were applied to the coagulation/flocculation process in a WWTP equipped with online sensors, and the essential variables, such as influent total phosphorus (TP), effluent TP, coagulant dose, and flocculant dose, were collected to predict the effluent TP with a maximum R 2 of 0.76; however, this kind of data collection seems to be unachievable for some small-scale WWTPs. Thus, when the data set is incomplete, feature selection matters for the exploration of the correlated variables for output prediction.…”