From calibration to parameter learning: Harnessing the scaling effects of big data in geoscientific modeling

Tsai, Wen-Ping; Feng, Dapeng; Pan, Ming; Beck, Hylke E.; Lawson, Kathryn; Yang, Yuan; Liu, Jiangtao; Shen, Chaopeng

doi:10.1038/s41467-021-26107-z

Cited by 143 publications

(163 citation statements)

References 51 publications

(68 reference statements)

Supporting

Mentioning

162

Contrasting

Order By: Relevance

“…The data used here represent the best-instrumented sites from USGS, and 415 locations are only a tiny fraction of the millions of river reaches in the United States. In the future, the combination of processbased modelling and machine learning may allow more robust predictions on a global scale, which are already started by other scholars (Jia et al, 2021;Karpatne et al, 2018;Read et al, 2019;Tsai et al, 2021).…”

Section: Further Discussionmentioning

confidence: 92%

Deep learning approaches for improving prediction of daily stream temperature in data‐scarce, unmonitored, and dammed basins

Rahmani

Oliver

Lawson

et al. 2021

Hydrological Processes

Self Cite

View full text Add to dashboard Cite

Basin‐centric long short‐term memory (LSTM) network models have recently been shown to be an exceptionally powerful tool for stream temperature (Ts) temporal prediction (training in one period and predicting in another period at the same sites). However, spatial extrapolation is a well‐known challenge to modelling Ts and it is uncertain how an LSTM‐based daily Ts model will perform in unmonitored or dammed basins. Here we compiled a new benchmark dataset consisting of >400 basins across the contiguous United States in different data availability groups (DAG, meaning the daily sampling frequency) with and without major dams, and studied how to assemble suitable training datasets for predictions in basins with or without temperature monitoring. For prediction in unmonitored basins (PUB), LSTM produced a root‐mean‐square error (RMSE) of 1.129°C and an R2 of 0.983. While these metrics declined from LSTM's temporal prediction performance, they far surpassed traditional models' PUB values, and were competitive with traditional models' temporal prediction on calibrated sites. Even for unmonitored basins with major reservoirs, we obtained a median RMSE of 1.202°C and an R2 of 0.984. For temporal prediction, the most suitable training set was the matching DAG that the basin could be grouped into (for example, the 60% DAG was most suitable for a basin with 61% data availability). However, for PUB, a training dataset including all basins with data was consistently preferred. An input‐selection ensemble moderately mitigated attribute overfitting. Our results indicate there are influential latent processes not sufficiently described by the inputs (e.g., geology, wetland covers), but temporal fluctuations can still be predicted well, and LSTM appears to be a highly accurate Ts modelling tool even for spatial extrapolation.

show abstract

Section: Further Discussionmentioning

confidence: 92%

Deep learning approaches for improving prediction of daily stream temperature in data‐scarce, unmonitored, and dammed basins

Rahmani

Oliver

Lawson

et al. 2021

Hydrological Processes

Self Cite

View full text Add to dashboard Cite

show abstract

“…This analysis is not intended to be a formal SBI or model calibration; rather, the purpose is to further explore the validity of our emulators. This approach is much more simple than other calibration approaches that might employ an evolution search algorithm [41], gradient-based method to adjust parameters in a series of more limited model simulations [42] or even use ML approaches to replace the calibration routine [43]. Given this proof of concept, future work should include more complex frameworks, including those that loop parameters back to the original physical model simulation or use a more formalized Bayesian framework [44].…”

Section: Parameter Evaluationmentioning

confidence: 99%

“…lations [42] or even use ML approaches to replace the calibration routine [43]. Given th proof of concept, future work should include more complex frameworks, including tho that loop parameters back to the original physical model simulation or use a more forma ized Bayesian framework [44].…”

Section: Base-case Model Performance and In Range Test Casesmentioning

confidence: 99%

A Physics-Informed, Machine Learning Emulator of a 2D Surface Water Model: What Temporal Networks and Simulation-Based Inference Can Help Us Learn about Hydrologic Processes

Maxwell

Condon

2021

Water

View full text Add to dashboard Cite

While machine learning approaches are rapidly being applied to hydrologic problems, physics-informed approaches are still relatively rare. Many successful deep-learning applications have focused on point estimates of streamflow trained on stream gauge observations over time. While these approaches show promise for some applications, there is a need for distributed approaches that can produce accurate two-dimensional results of model states, such as ponded water depth. Here, we demonstrate a 2D emulator of the Tilted V catchment benchmark problem with solutions provided by the integrated hydrology model ParFlow. This emulator model can use 2D Convolution Neural Network (CNN), 3D CNN, and U-Net machine learning architectures and produces time-dependent spatial maps of ponded water depth from which hydrographs and other hydrologic quantities of interest may be derived. A comparison of different deep learning architectures and hyperparameters is presented with particular focus on approaches such as 3D CNN (that have a time-dependent learning component) and 2D CNN and U-Net approaches (that use only the current model state to predict the next state in time). In addition to testing model performance, we also use a simplified simulation based inference approach to evaluate the ability to calibrate the emulator to randomly selected simulations and the match between ML calibrated input parameters and underlying physics-based simulation.

show abstract

“…Such gradient information is extremely useful for solving previously difficult or unsolvable problems. For example, Tsai et al (2021) recently proposed a novel differentiable parameter learning (dPL) framework to integrate big-data DL and differentiable PBMs for parameter calibration. Another example is the use deep learning surrogate model to perform riverine bathymetry inversion (Ghorbanidehno et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Surrogate Model for Shallow Water Equations Solvers with Deep Learning

Song¹,

Liu²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

From calibration to parameter learning: Harnessing the scaling effects of big data in geoscientific modeling

Cited by 143 publications

References 51 publications

Deep learning approaches for improving prediction of daily stream temperature in data‐scarce, unmonitored, and dammed basins

Deep learning approaches for improving prediction of daily stream temperature in data‐scarce, unmonitored, and dammed basins

A Physics-Informed, Machine Learning Emulator of a 2D Surface Water Model: What Temporal Networks and Simulation-Based Inference Can Help Us Learn about Hydrologic Processes

Surrogate Model for Shallow Water Equations Solvers with Deep Learning

Contact Info

Product

Resources

About