With the rapid accumulation of water flux observations from global eddy-covariance flux sites, many studies have used data-driven approaches to model water fluxes, with various predictors and machine learning algorithms used. However, it is unclear how various model features affect prediction accuracy. To fill this gap, we evaluated this issue based on records of 139 developed models collected from 32 such studies. Support vector machines (SVMs; average R-squared = 0.82) and RF (random forest; average R-squared = 0.81) outperformed other evaluated algorithms with sufficient sample size in both cross-study and intrastudy (with the same data) comparisons. The average accuracy of the model applied to arid regions is higher than in other climate types. The average accuracy of the model was slightly lower for forest sites (average R-squared = 0.76) than for croplands and grasslands (average R-squared = 0.8 and 0.79) but higher than for shrubland sites (average Rsquared = 0.67). Using R n /R s , precipitation, T a , and the fraction of absorbed photosynthetically active radiation (FA-PAR) improved the model accuracy. The combined use of T a and R n /R s is very effective, especially in forests, while in grasslands the combination of W s and R n /R s is also effective. Random cross-validation showed higher model accuracy than spatial cross-validation and temporal crossvalidation, but spatial cross-validation is more important in spatial extrapolation. The findings of this study are promising to guide future research on such machine-learning-based modeling.
IntroductionEvapotranspiration (ET) is one of the most important components of the water cycle in terrestrial ecosystems. It also represents the key variable in linking ecosystem functioning, carbon and climate feedback, agricultural management, and water resources (Fisher et al., 2017). The quantification of ET for regions, continents, or the globe can improve our understanding of water, heat, and carbon interactions, which is important for global change research (Xu et al., 2018). Information on ET has been used in many fields, including, but not limited to, droughts and heat waves (Miralles et al., 2014), regional water balance closures (Chen et al., 2014;Sahoo et al., 2011), agricultural management (Allen et al., 2011, water resources management (Anderson et al., 2012),