Wastewater treatment companies are facing several challenges related to the optimization of energy efficiency, meeting more restricted water quality standards, and resource recovery potential. Over the past decades, computational models have gained recognition as effective tools for addressing some of these challenges, contributing to the economic and operational efficiencies of wastewater treatment plants (WWTPs). To predict the performance of WWTPs, numerous deterministic, stochastic, and time series-based models have been developed. Mechanistic models, incorporating physical and empirical knowledge, are dominant as predictive models. However, these models represent a simplification of reality, resulting in model structure uncertainty and a constant need for calibration. With the increasing amount of available data, data-driven models are becoming more attractive. The implementation of predictive models can revolutionize the way companies manage WWTPs by permitting the development of digital twins for process simulation in (near) real-time. In data-driven models, the structure is not explicitly specified but is instead determined by searching for relationships in the available data. Thus, the main objective of the present review is to discuss the implementation of machine learning models for the prediction of WWTP effluent characteristics and wastewater inflows as well as anomaly detection studies and energy consumption optimization in WWTPs. Furthermore, an overview considering the merging of both mechanistic and machine learning models resulting in hybrid models is presented as a promising approach. A critical assessment of the main gaps and future directions on the implementation of mathematical modeling in wastewater treatment processes is also presented, focusing on topics such as the explainability of data-driven models and the use of Transfer Learning processes.