Abstract. Advances in space-based observations have provided the capacity to develop regional-to global-scale estimates of evaporation, offering insights into this key component of the hydrological cycle. However, the evaluation of large-scale evaporation retrievals is not a straightforward task. While a number of studies have intercompared a range of these evaporation products by examining the variance amongst them, or by comparison of pixel-scale retrievals against ground-based observations, there is a need to explore more appropriate techniques to comprehensively evaluate remote-sensing-based estimates. One possible approach is to establish the level of product agreement between related hydrological components: for instance, how well do evaporation patterns and response match with precipitation or water storage changes? To assess the suitability of this "consistency"-based approach for evaluating evaporation products, we focused our investigation on four globally distributed basins in arid and semi-arid environments, comprising the Colorado River basin, Niger River basin, Aral Sea basin, and Lake Eyre basin. In an effort to assess retrieval quality, three satellite-based global evaporation products based on different methodologies and input data, including CSIRO-PML, the MODIS Global Evapotranspiration product (MOD16), and Global Land Evaporation: the Amsterdam Methodology (GLEAM), were evaluated against rainfall data from the Global Precipitation Climatology Project (GPCP) along with Gravity Recovery and Climate Experiment (GRACE) water storage anomalies. To ensure a fair comparison, we evaluated consistency using a degree correlation approach after transforming both evaporation and precipitation data into spherical harmonics. Overall we found no persistent hydrological consistency in these dryland environments. Indeed, the degree correlation showed oscillating values between periods of low and high water storage changes, with a phase difference of about 2-3 months. Interestingly, after imposing a simple lag in GRACE data to account for delayed surface runoff or baseflow components, an improved match in terms of degree correlation was observed in the Niger River basin. Significant improvements to the degree correlations (from ∼ 0 to about 0.6) were also found in the Colorado River basin for both the CSIRO-PML and GLEAM products, while MOD16 showed only half of that improvement. In other basins, the variability in the temporal pattern of degree correlations remained considerable and hindered any clear differentiation between the evaporation products. Even so, it was found that a constant lag of 2 months provided a better fit compared to other alternatives, including a zero lag. From a product assessment perspective, no significant or persistent advantage could be discerned across any of the three evaporation products in terms of a sustained hydrological consistency with precipitation and water storage anomaly data. As a result, our analysis has implications in terms of the confidence that can be placed in ind...