Global River Models (GRMs), which simulate river flow and flood
processes, have rapidly developed in recent decades. However, these
advancements necessitate meaningful and standardized quality assessments
and comparisons against a suitable set of observational variables using
appropriate metrics, a requirement currently lacking within GRM
communities. This study proposes the implementation of a benchmark
system designed to facilitate the assessment of river models and enables
comparisons against established benchmarks. The benchmark system
incorporates satellite remote sensing data, including water surface
elevation and inundation extent information, with necessary
preprocessing. Consequently, this evaluation system encompasses a larger
geographical area compared to traditional methods relying solely on
in-situ river discharge measurements for GRMs. A set of evaluation and
comparison metrics has been developed, including a quantile-based
comparison metric that allows for a comprehensive analysis of multiple
simulation outputs. The test application of this benchmark system to a
global river model (CaMa-Flood), utilizing diverse runoff inputs,
illustrates that the incorporation of bias-corrected runoff data leads
to improved model performance across various observational variables and
performance metrics. The current iteration of the benchmark system is
suitable for global-scale assessments and can effectively evaluate the
impact of model development as well as facilitate intercomparisons among
different models. The source codes are accessiable from
https://doi.org/10.5281/zenodo.10903211.