The accurate forecast of wastewater treatment plant (WWTP) key features can comprehend and predict the plant behavior to support process design and controls, improve system reliability, reduce operational costs, and endorse optimization of overall performances. Deep learning technologies as proven data-driven soft-sensors should be developed for WWTP applications to tackle the process of non-linearity and the dynamic nature of environmental data. This study adopts deep learning-based models as softsensors to forecast WWTP key features, such as influent flow, influent temperature, influent biochemical oxygen demand (BOD), effluent chloride, effluent BOD, and power consumption. We constructed six deep learning models derived from long short-term memory (LSTM) and gated recurrent unit (GRU), namely traditional LSTM and GRU, the exponentially smoothed LSTM, and the adaptive version of LSTM and smoothed LSTM. The employment of a smoothed LSTM technique is expected to reduce the outlier effect and to improve forecasting accuracy. Meanwhile, the usage of adaptive deep models will enhance the capabilities of the LSTM to quickly and accurately follow the trend of future data. We compared the performance of these models with Bi-directional LSTM (BiLSTM) and the seasonal decomposition using local regression. The historical records from a coastal municipal WWTP in Saudi Arabia are used to verify the investigated models' effectiveness. The proposed models provide promising forecasting results but require no assumptions on the data distributions. In terms of efficiency, GRU based models converge faster than LSTM based models. In terms of accuracy, the LSTM soft-sensor shows overall the optimal result for all key features followed by the exponentially-smoothed GRU and LSTM. By contrast, the adaptive models achieved the lowest forecasting performance compared to the other models. These findings will benefit practitioners to achieve data-driven WWTP management.