The inherent biases of different long-term gridded surface soil moisture (SSM) products, unconstrained by the in situ observations, implies different spatio-temporal patterns. In this study, the Random Forest (RF) model was trained to predict SSM from relevant land surface feature variables (i.e., land surface temperature, vegetation indices, soil texture, and geographical information) and precipitation, based on the in situ soil moisture data of the International Soil Moisture Network (ISMN.). The results of the RF model show an RMSE of 0.05 m3 m−3 and a correlation coefficient of 0.9. The calculated impurity-based feature importance indicates that the Antecedent Precipitation Index affects most of the predicted soil moisture. The geographical coordinates also significantly influence the prediction (i.e., RMSE was reduced to 0.03 m3 m−3 after considering geographical coordinates), followed by land surface temperature, vegetation indices, and soil texture. The spatio-temporal pattern of RF predicted SSM was compared with the European Space Agency Climate Change Initiative (ESA-CCI) soil moisture product, using both time-longitude and latitude diagrams. The results indicate that the RF SSM captures the spatial distribution and the daily, seasonal, and annual variabilities globally.
Although soil moisture is a key factor of hydrologic and climate applications, global continuous high resolution soil moisture datasets are still limited. Here we use physics-informed machine learning to generate a global, long-term, spatially continuous high resolution dataset of surface soil moisture, using International Soil Moisture Network (ISMN), remote sensing and meteorological data, guided with the knowledge of physical processes impacting soil moisture dynamics. Global Surface Soil Moisture (GSSM1 km) provides surface soil moisture (0–5 cm) at 1 km spatial and daily temporal resolution over the period 2000–2020. The performance of the GSSM1 km dataset is evaluated with testing and validation datasets, and via inter-comparisons with existing soil moisture products. The root mean square error of GSSM1 km in testing set is 0.05 cm3/cm3, and correlation coefficient is 0.9. In terms of the feature importance, Antecedent Precipitation Evaporation Index (APEI) is the most important significant predictor among 18 predictors, followed by evaporation and longitude. GSSM1 km product can support the investigation of large-scale climate extremes and long-term trend analysis.
Abstract. Accurate information on surface soil moisture (SSM) content at a global scale under different climatic conditions is important for hydrological and climatological applications. Machine learning (ML) based systematic integration of in-situ hydrological measurements, complex environmental and climate data and satellite observation facilitate to generate the best data products to monitor and analyse the exchanges of water, energy and carbon in the Earth system at a proper space-time resolution. This study investigates the estimation of daily SSM using eight optimised ML algorithms and ten ensemble models (constructed via model bootstrap aggregating techniques and five-fold cross-validation). The algorithmic implementations were trained and tested using the international soil moisture network (ISMN) data collected from 1722 stations distributed across the World. The result showed that K-neighbours Regressor (KNR) performs best on “test_random” set, while Random Forest Regressor (RFR) performs best on “test_temporal” and “test_independent-stations”. Independent evaluation on novel stations across different climate zones was conducted. For the optimised ML algorithms, the median RMSEs were below 0.1 cm3/cm3. GradientBoosting (GB), Multi-layer Perceptron Regressor (MLPR), Stochastic Gradient Descent Regressor (SGDR), and Random Forest Regressor (RFR) achieved a median r score of 0.6 in twelve, eleven, nine and nine climate zones, respectively, out of fifteen climate zones. The performance of ensemble models improved significantly with the median value of RMSE below 0.075 cm3/cm3 for all climate zones . All voting regressors achieved the r scores of above 0.6 in thirteen climate zones except BSh and BWh because of the sparse distribution of training stations. The metrical evaluation showed that ensemble models can improve the performance of single ML algorithms and achieve more stable results. Based on the results computed for three different test sets, the ensemble model with KNR, RFR and XB performed the best. Overall, our investigation shows that ensemble machine learning algorithms have a greater capability for predicting SSM compared to the optimised, or base ML algorithms, and indicates their huge potential applicability in estimating water cycle budgets, managing irrigation and predicting crop yields.
Figure S1. Taylor diagram on (a) "test_random", (b) "test_temporal", and (c) "test_independentstations"
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.