Abstract. Terrestrial water storage (TWS) is an integrative hydrological state that is key for our understanding of the global water cycle. The TWS observation from GRACE missions has, therefore, been instrumental in calibration and validation of hydrological models and understanding the variations of the hydrological storages. The models, however, still show significant uncertainties in reproducing observed TWS variations, especially for the interannual variability (IAV) at the global scale. Here, we diagnose the regions dominating the variance of globally integrated TWS IAV, and sources of the errors in two data-driven hydrological models that were calibrated against global TWS, snow water equivalent, evapotranspiration, and runoff data: 1) a parsimonious process-based hydrological model, the Strategies to INtegrate Data and BiogeochemicAl moDels (SINDBAD) framework, and 2) a machine learning-physically based hybrid hydrological model (H2M) that combines a dynamic neural network with a water balance concept. While both models agree with GRACE that global TWS IAV is largely driven by the semi-arid regions of southern Africa, Indian subcontinent and northern Australia, and the humid regions of northern South America and Mekong River Basin, the models still show errors such as overestimation of the observed magnitude of TWS IAV at the global scale. Our analysis identifies modeling error hotspots of the global TWS IAV mostly in the tropical regions including Amazon, sub–Saharan regions, and Southeast Asia, indicating that the regions that dominate global TWS IAV are not necessarily the same as those that dominate the error in global TWS IAV. Excluding those error hotspot regions in the global integration yields large improvements of simulated global TWS IAV, which implies that model improvements can focus on improving processes in these hotspot regions. Further analysis indicates that error hotspot regions are associated with lateral flow dynamics, including both sub-pixel moisture convergence and across pixel lateral river flow, or with interactions between surface processes and groundwater. The association of model deficiencies with land processes that delay the TWS variation could, in part, explain why the models cannot represent the observed lagged response of TWS IAV to precipitation IAV in hotspot regions that manifest to errors in global TWS IAV. Our approach presents a general avenue to better diagnose model simulation errors for global data streams to guide efficient and focused model development for regions and processes that matter the most.