“…Ad hoc benchmarking and model intercomparison studies are common (e.g., Andréassian et al, 2009;Best et al, 2015;Kratzert et al, 2019b;Lane et al, 2019;Berthet et al, 2020;Nearing et al, 2018), and while the community has a (quickly growing) large-sample dataset for benchmarking hydrological models (Newman et al, 2017;Kratzert et al, 2019b), we lack standardized, open procedures for conducting comparative uncertainty estimation studies. Note that from the references above only Berthet et al (2020) focused on benchmarking uncertainty estimation strategies, and then only for assessing postprocessing approaches. We previously argued that data-based models provide a meaningful and general benchmark for testing hypotheses and models (Nearing and Gupta, 2015;Nearing et al, 2020b), and here we develop a set of data-based uncertainty estimation benchmarks built on a standard, publicly available, large-sample dataset that could be used as a baseline for future benchmarking studies.…”