This article describes a data-driven framework based on spatiotemporal machine learning to produce distribution maps for 16 tree species (Abies alba Mill., Castanea sativa Mill., Corylus avellana L., Fagus sylvatica L., Olea europaea L., Picea abies L. H. Karst., Pinus halepensis Mill., Pinus nigra J. F. Arnold, Pinus pinea L., Pinus sylvestris L., Prunus avium L., Quercus cerris L., Quercus ilex L., Quercus robur L., Quercus suber L. and Salix caprea L.) at high spatial resolution (30 m). Tree occurrence data for a total of three million of points was used to train different algorithms: random forest, gradient-boosted trees, generalized linear models, k-nearest neighbors, CART and an artificial neural network. A stack of 305 coarse and high resolution covariates representing spectral reflectance, different biophysical conditions and biotic competition was used as predictors for realized distributions, while potential distribution was modelled with environmental predictors only. Logloss and computing time were used to select the three best algorithms to tune and train an ensemble model based on stacking with a logistic regressor as a meta-learner. An ensemble model was trained for each species: probability and model uncertainty maps of realized distribution were produced for each species using a time window of 4 years for a total of six distribution maps per species, while for potential distributions only one map per species was produced. Results of spatial cross validation show that the ensemble model consistently outperformed or performed as good as the best individual model in both potential and realized distribution tasks, with potential distribution models achieving higher predictive performances (TSS = 0.898, R2logloss = 0.857) than realized distribution ones on average (TSS = 0.874, R2logloss = 0.839). Ensemble models for Q. suber achieved the best performances in both potential (TSS = 0.968, R2logloss = 0.952) and realized (TSS = 0.959, R2logloss = 0.949) distribution, while P. sylvestris (TSS = 0.731, 0.785, R2logloss = 0.585, 0.670, respectively, for potential and realized distribution) and P. nigra (TSS = 0.658, 0.686, R2logloss = 0.623, 0.664) achieved the worst. Importance of predictor variables differed across species and models, with the green band for summer and the Normalized Difference Vegetation Index (NDVI) for fall for realized distribution and the diffuse irradiation and precipitation of the driest quarter (BIO17) being the most frequent and important for potential distribution. On average, fine-resolution models outperformed coarse resolution models (250 m) for realized distribution (TSS = +6.5%, R2logloss = +7.5%). The framework shows how combining continuous and consistent Earth Observation time series data with state of the art machine learning can be used to derive dynamic distribution maps. The produced predictions can be used to quantify temporal trends of potential forest degradation and species composition change.
This paper describes a data-driven framework based on spatiotemporal machine learning to producedistribution maps for 16 tree species (Abies alba Mill., Castanea sativa Mill., Corylus avellana L., Fagussylvatica L., Olea europaea L., Picea abies L. H. Karst., Pinus halepensis Mill., Pinus nigra J. F. Arnold,Pinus pinea L., Pinus sylvestris L., Prunus avium L., Quercus cerris L., Quercus ilex L., Quercus roburL., Quercus suber L. and Salix caprea L.) at high spatial resolution (30 m). Tree occurrence data for atotal of 3 million of points was used to train different algorithms: random forest, gradient-boosted trees,generalized linear models, k-nearest neighbors, CART and an artificial neural network. A stack of 305 coarseand high resolution covariates representing spectral reflectance, different biophysical conditions and bioticcompetition was used as predictors for realized distributions, while potential distribution was modelled withenvironmental predictors only. Logloss and computing time were used to select the three best algorithms totune and train an ensemble model based on stacking with a logistic regressor as a meta-learner. An ensemblemodel was trained for each species: probability and model uncertainty maps of realized distribution wereproduced for each species using a time window of 4 years for a total of 6 distribution maps per species, whilefor potential distributions only one map per species was produced. Results of spatial cross validation showthat the ensemble model consistently outperformed or performed as good as the best individual model inboth potential and realized distribution tasks, with potential distribution models achieving higher predictiveperformances (TSS = 0.898, R2logloss = 0.857) than realized distribution ones on average (TSS = 0.874,R2logloss = 0.839). Ensemble models for Q. suber achieved the best performances in both potential (TSS =0.968, R2logloss = 0.952) and realized (TSS = 0.959, R2logloss = 0.949) distribution, while P. sylvestris (TSS= 0.731, 0.785, R2logloss = 0.585, 0.670, respectively, for potential and realized distribution) and P. nigra(TSS = 0.658, 0.686, R2logloss = 0.623, 0.664) achieved the worst. Importance of predictor variables differedacross species and models, with the green band for summer and the Normalized Difference Vegetation Index(NDVI) for fall for realized distribution and the diffuse irradiation and precipitation of the driest quarter(BIO17) being the most frequent and important for potential distribution. On average, fine-resolutionmodels outperformed coarse resolution models (250 m) for realized distribution (TSS = +6.5%, R2logloss =+7.5%). The framework shows how combining continuous and consistent Earth Observation time seriesdata with state of the art machine learning can be used to derive dynamic distribution maps. The producedpredictions can be used to quantify temporal trends of potential forest degradation and species compositionchange.
Paper describes a data-driven framework based on spatio-temporal ensemble machine learning to produce distribution maps for 16 forest tree species (Abies alba Mill., Castanea sativa Mill. , Corylus avellana L., Fagus sylvatica L., Olea europaea L., Picea abies L. H. Karst., Pinus halepensis Mill., Pinus nigra J. F. Arnold, Pinus pinea L., Pinus sylvestris L., Prunus avium L., Quercus cerris L., Quercus ilex L., Quercus robur L., Quercus suber L. and Salix caprea L.) at high spatial resolution (30 m). Tree occurrence data for a total of 3 million of points was used to train different Machine Learning (ML) algorithms: random forest, gradient-boosted trees, generalized linear models, k-nearest neighbors, CART and an artificial neural network. A stack of 585 coarse and high resolution covariates representing spectral reflectance (Landsat bands, spectral indices; time-series of seasonal composites), different biophysical conditions (i.e. temperature, precipitation, elevation, lithology) and biotic competition (other species distribution maps) was used as predictors for realized distributions, while potential distribution was modelled with environmental predictors only. Logloss and computing time were used to select the three best algorithms to train an ensemble model based on stacking with a logistic regressor as a meta-learner for each species. High resolution (30 m) probability and model uncertainty maps of realized distribution were produced for each species using a time window of 4 years for a total of 6 distribution maps per species for the studied period, while for potential distributions only one map per species was produced. Results of spatial cross validation show that Olea europaea and Quercus suber achieved the best performances in both potential and realized distribution, while Pinus sylvestris and Salix caprea achieved the worst. Further analysis shows that fine-resolution models consistently outperformed coarse resolution models (250 m) for realized distribution (average decrease in logloss: +53%). Realized distribution models achieved higher predictive performances than potential distribution ones. Importance of predictor variables differed across species and models, with the green band for summer and the NDWI and NDVI for fall for realized distribution and the diffuse irradiation and precipitation of the driest quarter being the most important and frequent for potential distribution. The ensemble model outperformed or performed as good as the best individual model in all potential species distributions, while for ten species it performed worse than the best individual model in modeling realized distributions. The framework shows how combining continuous and consistent EO time series data with state of the art ML can be used to derive dynamic distribution maps. The produced time-series occurrence predictions can be used to quantify temporal trends and detect potential forest degradation.
The global potential distribution of biomes (natural vegetation) was modelled using 8959 training points from the BIOME 6000 dataset and a stack of 72 environmental covariates representing terrain and the current climatic conditions based on historical long term averages (1979–2013). An ensemble machine learning model based on stacked regularization was used, with multinomial logistic regression as the meta-learner and spatial blocking (100 km) to deal with spatial autocorrelation of the training points. Results of spatial cross-validation for the BIOME 6000 classes show an overall accuracy of 0.67 and R2logloss of 0.61, with ”tropical evergreen broadleaf forest” being the class with highest gain in predictive performances (R2logloss = 0.74) and ”prostrate dwarf shrub tundra” the class with the lowest (R2logloss = -0.09) compared to the baseline. Temperature-related covariates were the most important predictors, with the mean diurnal range (BIO2) being shared by all the base-learners (i.e. random forest, gradient boosted trees and generalized linear models). The model was next used to predict the distribution of future biomes for the periods 2040–2060 and 2061–2080 under three climate change scenarios (RCP 2.6, 4.5 and 8.5). Comparisons of predictions for the three epochs (present, 2040–2060 and 2061–2080) show that increasing aridity and higher temperatures will likely result in significant shifts in natural vegetation in the tropical area (shifts from tropical forests to savannas up to 1.7 × 105 km2 by 2080) and around the Arctic Circle (shifts from tundra to boreal forests up to 2.4 × 105 km2 by 2080). Projected global maps at 1 km spatial resolution are provided as probability and hard classes maps for BIOME 6000 classes and as hard classes maps for the IUCN classes (6 aggregated classes). Uncertainty maps (prediction error) are also provided and should be used for careful interpretation of the future projections.
The global potential distribution of biomes (natural vegetation) was modelled using 8,959 training points from the BIOME 6000 dataset and a stack of 72 environmental covariates representing terrain and the current climatic conditions based on historical long term averages (1979–2013). An ensemble machine learning model based on stacked regularization was used, with multinomial logistic regression as the meta-learner and spatial blocking (100 km) to deal with spatial autocorrelation of the training points. Results of spatial cross-validation for the BIOME 6000 classes show an overall accuracy of 0.67 and R2logloss of 0.61, with “tropical evergreen broadleaf forest” being the class with highest gain in predictive performances (R2logloss = 0.74) and “prostrate dwarf shrub tundra” the class with the lowest (R2logloss = −0.09) compared to the baseline. Temperature-related covariates were the most important predictors, with the mean diurnal range (BIO2) being shared by all the base-learners (i.e.,random forest, gradient boosted trees and generalized linear models). The model was next used to predict the distribution of future biomes for the periods 2040–2060 and 2061–2080 under three climate change scenarios (RCP 2.6, 4.5 and 8.5). Comparisons of predictions for the three epochs (present, 2040–2060 and 2061–2080) show that increasing aridity and higher temperatures will likely result in significant shifts in natural vegetation in the tropical area (shifts from tropical forests to savannas up to 1.7 ×105 km2 by 2080) and around the Arctic Circle (shifts from tundra to boreal forests up to 2.4 ×105 km2 by 2080). Projected global maps at 1 km spatial resolution are provided as probability and hard classes maps for BIOME 6000 classes and as hard classes maps for the IUCN classes (six aggregated classes). Uncertainty maps (prediction error) are also provided and should be used for careful interpretation of the future projections.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.