2021
DOI: 10.1016/j.advwatres.2021.103920
|View full text |Cite
|
Sign up to set email alerts
|

A workflow to address pitfalls and challenges in applying machine learning models to hydrology

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 34 publications
(10 citation statements)
references
References 71 publications
0
10
0
Order By: Relevance
“…In hydrological forecasting, all principal ML approaches have recently been used, ranging from artificial neural networks (ANNs) to deep learning models, such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, support vector machines (SVMs and SVRs), decision trees, and random forests [8,10,21,67]. Each of the approaches has its specific characteristics, justifying its use for flood modeling in the given context [69,70]. The LSTM and ANN models have proven their ability to capture temporal dependencies and flexibility and provide simulation over long time periods; however, they typically require complex and large datasets for training [7,68].…”
Section: Discussionmentioning
confidence: 99%
“…In hydrological forecasting, all principal ML approaches have recently been used, ranging from artificial neural networks (ANNs) to deep learning models, such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, support vector machines (SVMs and SVRs), decision trees, and random forests [8,10,21,67]. Each of the approaches has its specific characteristics, justifying its use for flood modeling in the given context [69,70]. The LSTM and ANN models have proven their ability to capture temporal dependencies and flexibility and provide simulation over long time periods; however, they typically require complex and large datasets for training [7,68].…”
Section: Discussionmentioning
confidence: 99%
“…In regression and classification, a best practice is to use k‐fold cross‐validation that partitions available data into k‐sets and iteratively trains the model using data from each set for testing, which leads to better model generalizability (Bergmeir & Benítez, 2012). Common pitfalls in model design are including excess irrelevant or redundant variables as inputs, variable selection bias (i.e., using the same data for training and inputs), resubstitution validation (i.e., testing the model with training data), use of inconsistent cross‐validation and resampling procedures across model architectures being implemented, and data leakage (e.g., using testing data for model training or hyperparameter optimization, or pre‐processing the entire dataset prior to splitting the data into cross‐validation folds), which can lead to overfitting (Gharib & Davies, 2021; Zhang, 2007). These pitfalls can be avoided by understanding the details and limitations of the models being implemented, following best practices, and using robust ML workflows (Gharib & Davies, 2021; Zhang, 2007).…”
Section: Opportunities For Advancement Of Water Quality MLmentioning
confidence: 99%
“…These pitfalls can be avoided by understanding the details and limitations of the models being implemented, following best practices, and using robust ML workflows (Gharib & Davies, 2021;Zhang, 2007).…”
Section: State-of-the-art Machine Learning In River Water Quality Modelsmentioning
confidence: 99%
“…The success of such approaches is due as well to the mentioned increasing data availability and to the complexity of hydrological phenomena, which are difficult to model with linear or simple non linear statistical methods. For a full overview on the use of ML methods in hydrology, the reader could refer to the following recent papers: Zounemat-Kermani et al ( 2021), Gharib and Davies (2021), Rajaee et al (2020), Tyralis et al (2021).…”
Section: Introductionmentioning
confidence: 99%