There is a growing world need for predicting algal blooms in lakes and reservoirs to better manage water quality. We applied the random forest model with a sliding window strategy, which is one of the machine learning algorithms, to forecast chlorophyll-a concentrations in the fresh water of the Urayama Reservoir and the saline water of Lake Shinji. Both water bodies are situated in Japan and have historical water records containing more than ten years of data. The Random Forest (RF) model allowed us to forecast trends in time series of chlorophyll-a in these two water bodies. In the case of the reservoir, we used the data separately from two sampling stations. We found that the best model parameters for the number of min-leaf, and with/without pre-selection of predictors, varied at different stations in the same reservoir. We also found that the best performance of lead-time and accuracy of the prediction varied between the two stations. In the case of the lake, we found the best combination of a min-leaf and pre-selection of predictors was different from that of the reservoir case. Finally, the most influential parameters for the random forest model in the two water bodies were identified as biochemical oxygen demand (BOD), chemical oxygen demand (COD), pH, and total nitrogen/total phosphorus (TN/TP).
HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
In aquatic ecosystems, anthropogenic activities disrupt nutrient fluxes, thereby promoting harmful algal blooms that could directly impact economies and human health. Within this framework, the forecasting of the proxy of chlorophyll a in coastal areas is the first step to managing these algal blooms. The primary goal was to analyze how phytoplankton bloom forecasts are impacted by different sampling frequencies, by using a machine learning model. The database used in this study was sourced from an automated system located in the English Channel. This device has a sampling frequency of 20 minutes. We considered 12 physicochemical parameters over a six-year period. Our forecast methodology is based on the random forest (RF) model and a sliding window strategy. The lag times for these sliding windows ranged from 12 hours to 3 months with four different sampling times until 1 day.The results indicate that the optimal forecast was obtained for a 20 minutes time step, with an average R² of 0.62. Moreover, the highest values of fluorescence were predicted when the water temperature was approximately 11.8°C. Consequently, we demonstrated that the sampling frequency directly impacts the forecast performance of an RF model. Furthermore, this kind of model can recreate interactions that closely resemble biological processes. Our study suggests that the RF model can utilize the additional information contained in high-frequency datasets. The methodology presented here lays the foundation for the development of a numerical decision-making tool that could help mitigate the impact of these algal blooms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.