Sparse regression interaction models for spatial prediction of soil properties in 3D

Pejović, Milutin; Nikolić, Mladen; Heuvelink, G.B.M.; Hengl, Tomislav; Kilibarda, Milorad; Bajat, Branislav

doi:10.1016/j.cageo.2018.05.008

Cited by 17 publications

(5 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For MP and CAL grid-search steps, these locations were selected using the method of Meyer et al [46] to benefit splitting diversity. In the final model adjustment, prior to predictions, sample-site splitting was conducted by means of their coordinates and the K-means algorithm to ensure equal spatial distribution [48].…”

Section: Model Selection and Performance Evaluationmentioning

confidence: 99%

“…The R packages 'raster' [55] and 'sf ' [56] were used for remote sensing and spatial data manipulation, and 'doParallel' [57] for parallel computing. An adaptation of the stratfold3d function of the 'sparsereg3D' package [48] was used to make the equally spatially distributed LLOCV folds, while the spatially random splits were created with the CreateSpacetimeFolds function from the 'CAST' package [58].…”

Section: Softwarementioning

confidence: 99%

See 1 more Smart Citation

Live Fuel Moisture Content Mapping in the Mediterranean Basin Using Random Forests and Combining MODIS Spectral and Thermal Data

2022

View full text Add to dashboard Cite

Remotely sensed vegetation indices have been widely used to estimate live fuel moisture content (LFMC). However, marked differences in vegetation structure affect the relationship between field-measured LFMC and reflectance, which limits spatial extrapolation of these indices. To overcome this limitation, we explored the potential of random forests (RF) to estimate LFMC at the subcontinental scale in the Mediterranean basin wildland. We built RF models (LFMCRF) using a combination of MODIS spectral bands, vegetation indices, surface temperature, and the day of year as predictors. We used the Globe-LFMC and the Catalan LFMC monitoring program databases as ground-truth samples (10,374 samples). LFMCRF was calibrated with samples collected between 2000 and 2014 and validated with samples from 2015 to 2019, with overall root mean square errors (RMSE) of 19.9% and 16.4%, respectively, which were lower than current approaches based on radiative transfer models (RMSE ~74–78%). We used our approach to generate a public database with weekly LFMC maps across the Mediterranean basin.

show abstract

Section: Model Selection and Performance Evaluationmentioning

confidence: 99%

Section: Softwarementioning

confidence: 99%

Live Fuel Moisture Content Mapping in the Mediterranean Basin Using Random Forests and Combining MODIS Spectral and Thermal Data

2022

View full text Add to dashboard Cite

show abstract

“…The test dataset was then used to assess the performance of the model. The advantage of nested LLOCV over standard LLOCV is that the data of the test fold are not used to tune the RF hyperparameters [46]. The hyperparameters for the final RF models were then calculated based on standard LLOCV, i.e., without nested folds (their role is just to approximate the accuracy of the final model).…”

Section: Real-world Case Studiesmentioning

confidence: 99%

Random Forest Spatial Interpolation

et al. 2020

Self Cite

View full text Add to dashboard Cite

For many decades, kriging and deterministic interpolation techniques, such as inverse distance weighting and nearest neighbour interpolation, have been the most popular spatial interpolation techniques. Kriging with external drift and regression kriging have become basic techniques that benefit both from spatial autocorrelation and covariate information. More recently, machine learning techniques, such as random forest and gradient boosting, have become increasingly popular and are now often used for spatial interpolation. Some attempts have been made to explicitly take the spatial component into account in machine learning, but so far, none of these approaches have taken the natural route of incorporating the nearest observations and their distances to the prediction location as covariates. In this research, we explored the value of including observations at the nearest locations and their distances from the prediction location by introducing Random Forest Spatial Interpolation (RFSI). We compared RFSI with deterministic interpolation methods, ordinary kriging, regression kriging, Random Forest and Random Forest for spatial prediction (RFsp) in three case studies. The first case study made use of synthetic data, i.e., simulations from normally distributed stationary random fields with a known semivariogram, for which ordinary kriging is known to be optimal. The second and third case studies evaluated the performance of the various interpolation methods using daily precipitation data for the 2016–2018 period in Catalonia, Spain, and mean daily temperature for the year 2008 in Croatia. Results of the synthetic case study showed that RFSI outperformed most simple deterministic interpolation techniques and had similar performance as inverse distance weighting and RFsp. As expected, kriging was the most accurate technique in the synthetic case study. In the precipitation and temperature case studies, RFSI mostly outperformed regression kriging, inverse distance weighting, random forest, and RFsp. Moreover, RFSI was substantially faster than RFsp, particularly when the training dataset was large and high-resolution prediction maps were made.

show abstract

“…The daily MeteoSerbia1km dataset was validated using nested 5-fold LLOCV, which combines nested k-fold 32 and leave-location-out cross-validation. For nested 5-fold LLOCV, as with the regular 5-fold LLOCV, the entire dataset was split into five folds.…”

Section: Technical Validationmentioning

confidence: 99%

A high-resolution daily gridded meteorological dataset for Serbia made by Random Forest Spatial Interpolation

et al. 2021

Self Cite

View full text Add to dashboard Cite

We produced the first daily gridded meteorological dataset at a 1-km spatial resolution across Serbia for 2000–2019, named MeteoSerbia1km. The dataset consists of five daily variables: maximum, minimum and mean temperature, mean sea-level pressure, and total precipitation. In addition to daily summaries, we produced monthly and annual summaries, and daily, monthly, and annual long-term means. Daily gridded data were interpolated using the Random Forest Spatial Interpolation methodology, based on using the nearest observations and distances to them as spatial covariates, together with environmental covariates to make a random forest model. The accuracy of the MeteoSerbia1km daily dataset was assessed using nested 5-fold leave-location-out cross-validation. All temperature variables and sea-level pressure showed high accuracy, although accuracy was lower for total precipitation, due to the discontinuity in its spatial distribution. MeteoSerbia1km was also compared with the E-OBS dataset with a coarser resolution: both datasets showed similar coarse-scale patterns for all daily meteorological variables, except for total precipitation. As a result of its high resolution, MeteoSerbia1km is suitable for further environmental analyses.

show abstract

Sparse regression interaction models for spatial prediction of soil properties in 3D

Cited by 17 publications

References 27 publications

Live Fuel Moisture Content Mapping in the Mediterranean Basin Using Random Forests and Combining MODIS Spectral and Thermal Data

Live Fuel Moisture Content Mapping in the Mediterranean Basin Using Random Forests and Combining MODIS Spectral and Thermal Data

Random Forest Spatial Interpolation

A high-resolution daily gridded meteorological dataset for Serbia made by Random Forest Spatial Interpolation

Contact Info

Product

Resources

About