Highlights
The monitoring of HABs can be improved using ML models for chlorophyll-a prediction.
ML model selection for HABs monitoring depends on target objectives.
Random forest model predicts chlorophyll-a better when the temporal dimension is not considered.
The LSTM model is essential for making time-dependent chlorophyll-a predictions for HABs monitoring.
Abstract. The complex dynamics of freshwater harmful algal blooms (HABs) necessitate proactive monitoring approaches to mitigate their impacts. The rapid breakthrough in computing prowess and statistical advances is triggering the development of data-driven techniques such as machine learning (ML) models, which have been shown in different fields to be instrumental in finding patterns for explaining relationships in observed data. This study assesses the ability of ML models for HABs monitoring in a lake using chlorophyll-a concentration as the index. The selected models for this study were regression tree, random forest (RF), multilayer perceptron (MLP), support vector regression (SVR), long short-term memory (LSTM), and gated recurrent unit (GRU) models, with the last two models able to consider the temporal sequence of obtained water quality datasets. The results showed that the RF model with R2, mean absolute error (MAE), and root mean square error (RMSE) of 0.87 µgL-1, 0.97 µgL-1, and 3.53 µgL-1, respectively, outperformed the SVR, MLP, and regression tree models. LSTM model with MAE and RMSE of 2.39 µgL-1 and 3.29 µgL-1, respectively, predicted temporal dynamics of chlorophyll-a better than GRU, although with more runtime, and showed the potential for developing real-time HAB monitoring and early warning systems. The findings reveal the robustness of the chosen ML models, thereby shedding light on crucial factors that necessitate careful deliberation by researchers and policymakers in determining the most suitable approaches for monitoring HABs. Keywords: Cyanobacteria, Early warning systems, Freshwater, HABs, Machine learning models.