Machine learning algorithms find frequent application in spatial prediction of biotic and abiotic environmental variables. However, the characteristics of spatial data, especially spatial autocorrelation, are widely ignored. We hypothesize that this is problematic and results in models that can reproduce training data but are unable to make spatial predictions beyond the locations of the training samples. We assume that not only spatial validation strategies but also spatial variable selection is essential for reliable spatial predictions.We introduce two case studies that use remote sensing to predict land cover and the leaf area index for the "Marburg Open Forest", an open research and education site of Marburg University, Germany. We use the machine learning algorithm Random Forests to train models using non-spatial and spatial cross-validation strategies to understand how spatial variable selection affects the predictions.Our findings confirm that spatial cross-validation is essential in preventing overoptimistic model performance. We further show that highly autocorrelated predictors (such as geolocation variables, e.g. latitude, longitude) can lead to considerable overfitting and result in models that can reproduce the training data but fail in making spatial predictions. The problem becomes apparent in the visual assessment of the spatial predictions that show clear artefacts that can be traced back to a misinterpretation of the spatially autocorrelated predictors by the algorithm. Spatial variable selection could automatically detect and remove such variables that lead to overfitting, resulting in reliable spatial prediction patterns and improved statistical spatial model performance.We conclude that in addition to spatial validation, a spatial variable selection must be considered in spatial predictions of ecological data to produce reliable predictions.
The Tibetan Plateau (TP) is a globally important “water tower” that provides water for nearly 40% of the world’s population. This supply function is claimed to be threatened by pasture degradation on the TP and the associated loss of water regulation functions. However, neither potential large scale degradation changes nor their drivers are known. Here, we analyse trends in a high-resolution dataset of grassland cover to determine the interactions among vegetation dynamics, climate change and human impacts on the TP. The results reveal that vegetation changes have regionally different triggers: While the vegetation cover has increased since the year 2000 in the north-eastern part of the TP due to an increase in precipitation, it has declined in the central and western parts of the TP due to rising air temperature and declining precipitation. Increasing livestock numbers as a result of land use changes exacerbated the negative trends but were not their exclusive driver. Thus, we conclude that climate variability instead of overgrazing has been the primary cause for large scale vegetation cover changes on the TP since the new millennium. Since areas of positive and negative changes are almost equal in extent, pasture degradation is not generally proceeding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.