Abstract. In this second part of the two-part paper, the data driven modeling (DDM) experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets) are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations) were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs), genetic programming (GP), evolutionary polynomial regression (EPR), Support vector machines (SVM), M5 model trees (M5), K nearest neighbors (K-nn), and multiple linear regression (MLR) techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike the two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations, and it should not be ignored as a potential modeling technique for hydrological applications.
Abstract. A comprehensive data driven modeling experiment is presented in two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM) techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed for the modeling experiment. Twelve different realizations (groups) from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both predictive accuracy and uncertainty of the modeling techniques can be evaluated. The implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.
Cold regions provide water resources for half the global population yet face rapid change. Their hydrology is dominated by snow, ice and frozen soils, and climate warming is having profound effects. Hydrological models have a key role in predicting changing water resources but are challenged in cold regions. Ground‐based data to quantify meteorological forcing and constrain model parameterization are limited, while hydrological processes are complex, often controlled by phase change energetics. River flows are impacted by poorly quantified human activities. This paper discusses the scientific and technical challenges of the large‐scale modelling of cold region systems and reports recent modelling developments, focussing on MESH, the Canadian community hydrological land surface scheme. New cold region process representations include improved blowing snow transport and sublimation, lateral land‐surface flow, prairie pothole pond storage dynamics, frozen ground infiltration and thermodynamics, and improved glacier modelling. New algorithms to represent water management include multistage reservoir operation. Parameterization has been supported by field observations and remotely sensed data; new methods for parameter identification have been used to evaluate model uncertainty and support regionalization. Additionally, MESH has been linked to broader decision‐support frameworks, including river ice simulation and hydrological forecasting. The paper also reports various applications to the Saskatchewan and Mackenzie River basins in western Canada (0.4 and 1.8 million km2). These basins arise in glaciated mountain headwaters, are partly underlain by permafrost, and include remote and incompletely understood forested, wetland, agricultural and tundra ecoregions. These illustrate the current capabilities and limitations of cold region modelling, and the extraordinary challenges to prediction, including the need to overcoming biases in forcing data sets, which can have disproportionate effects on the simulated hydrology.
Machine learning (ML) applications in Earth and environmental sciences (EES) have gained incredible momentum in recent years. However, these ML applications have largely evolved in ‘isolation’ from the mechanistic, process‐based modelling (PBM) paradigms, which have historically been the cornerstone of scientific discovery and policy support. In this perspective, we assert that the cultural barriers between the ML and PBM communities limit the potential of ML, and even its ‘hybridization’ with PBM, for EES applications. Fundamental, but often ignored, differences between ML and PBM are discussed as well as their strengths and weaknesses in light of three overarching modelling objectives in EES, (1) nowcasting and prediction, (2) scenario analysis, and (3) diagnostic learning. The paper ponders over a ‘coevolutionary’ approach to model building, shifting away from a borrowing to a co‐creation culture, to develop a generation of models that leverage the unique strengths of ML such as scalability to big data and high‐dimensional mapping, while remaining faithful to process‐based knowledge base and principles of model explainability and interpretability, and therefore, falsifiability.
Abstract. The mining of oil sands in northern Alberta, Canada, involves the stripping and salvage of surface soil layers to gain access to the oil mines. The oil sands industry has committed to reconstructing these disturbed watersheds to replicate the performance of the natural soil horizons and to reproduce the various functions of natural watersheds. The selection of the texture and thickness of the reconstructed soil cover layers is based primarily on the concept that all covers must have sufficient moisture for vegetation over the growing season. Assessment of the hydrological performance of the reconstructed soil covers is crucial to select the best cover alternative. A generic system dynamics watershed (GSDW) model is developed, based on the existing site-specific SDW model, and applied to five reconstructed watersheds located in the Athabasca mining basin, Alberta, Canada; and one natural watershed (boreal forest) located in Saskatchewan, Canada; to simulate the various hydrological processes; in particular, soil moisture patterns and actual evapotranspiration, in reconstructed and natural watersheds. The model is capable of capturing the dynamics of the water balance components in both reconstructed and natural watersheds. The developed GSDW model provides a vital tool, which enables the investigation of the utility of different soil cover alternative designs and evaluation of their performance. Moreover, the model can be used to conduct short- and long- term predictions under different climate scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.