One of the core tasks in digital soil mapping (DSM) studies is the estimation of the spatial distribution of different soil variables. In addition, however, assessing the uncertainty of these estimations is equally important, something that a lot of current DSM studies lack. Machine learning (ML) methods are increasingly used in this scientific field, the majority of which do not have intrinsic uncertainty estimation capabilities. A solution to this is the use of specific ML methods that provide advanced prediction capabilities, along with innate uncertainty estimation metrics, like Quantile Regression Forests (QRF). In the current paper, the prediction and the uncertainty capabilities of QRF, Random Forests (RF) and geostatistical methods were assessed. It was confirmed that QRF exhibited outstanding results at predicting soil organic matter (OM) in the study area. In particular, R2 was much higher than the geostatistical methods, signifying that more variation is explained by the specific model. Moreover, its uncertainty capabilities as presented in the uncertainty maps, shows that it can also provide a good estimation of the uncertainty with distinct representation of the local variation in specific parts of the area, something that is considered a significant advantage, especially for decision support purposes.
Machine learning (ML) algorithms are extensively used with outstanding prediction accuracy. However, in some cases, their overfitting capabilities, along with inadvertent biases, might produce overly optimistic results. Spatial data are a special kind of data that could introduce biases to ML due to their intrinsic spatial autocorrelation. To address this issue, a special resampling method has emerged called spatial cross-validation (SCV). The purpose of this study was to evaluate the performance of SCV compared with conventional random cross-validation (CCV) used in most ML studies. Multiple ML models were created with CCV and SCV to predict groundwater electrical conductivity (EC) with data (A) from Rhodope, Greece, in the summer of 2020; (B) from the same area but at a different time (summer 2019); and (C) from a new area (the Salento peninsula, Italy). The results showed that the SCV provides ML models with superior generalization capabilities and, hence, better prediction results in new unknown data. The SCV seems to be able to capture the spatial patterns in the data while also reducing the over-optimism bias that is often associated with CCV methods. Based on the results, SCV could be applied with ML in studies that use spatial data.
Yield estimations at global or regional spatial scales have been compromised due to poor crop model calibration. A methodology for estimating the genetic parameters related to grain growth and yield for the CERES-Wheat crop model is proposed based on yield gap concept, the GLUE coefficient estimator, and the global yield gap atlas (GYGA). Yield trials with three durum wheat cultivars in an experimental farm in northern Greece from 2004 to 2010 were used. The calibration strategy conducted with CERES-Wheat (embedded in DSSAT v.4.7.5) on potential mode taking into account the year-to-year variability of relative yield gap Yrg (YgC_adj) was: (i) more effective than using the average site value of Yrg (YgC_unadj) only (the relative RMSE ranged from 10 to 13% for the YgC_adj vs. 48 to 57% for YgC_unadj) and (ii) superior (slightly inferior) to the strategy conducted with DSSAT v.4.7.5 (DSSAT v.3.5—relative RMSE of 5 to 8% were found) on rainfed mode. Earlier anthesis, maturity, and decreased potential yield (from 2.2 to 3.9% for 2021–2050, and from 5.0 to 7.1% for 2071–2100), due to increased temperature and solar radiation, were found using an ensemble of 11 EURO-CORDEX regional climate model simulations. In conclusion, the proposed strategy provides a scientifically robust guideline for crop model calibration that minimizes input requirements due to operating the crop model on potential mode. Further testing of this methodology is required with different plants, crop models, and environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.