Abstract. Geophysical sensors combined with machine learning
algorithms were used to understand the pedosphere system and landscape
processes and to model soil attributes. In this research, we used parent
material, terrain attributes, and data from geophysical sensors in different
combinations to test and compare different and novel machine learning
algorithms to model soil attributes. We also analyzed the importance of
pedoenvironmental variables in predictive models. For that, we collected
soil physicochemical and geophysical data (gamma-ray emission from uranium,
thorium, and potassium; magnetic susceptibility and apparent electric
conductivity) by three sensors (gamma-ray spectrometer, RS 230;
susceptibilimeter KT10, Terraplus; and conductivimeter, EM38 Geonics) at
75 points and analyzed the data. The models with the best performance
(R2 0.48, 0.36, 0.44, 0.36, 0.25, and 0.31) varied for clay, sand,
Fe2O3, TiO2, SiO2, and cation exchange capacity
prediction, respectively. Modeling with the selection of covariates at three
phases (variance close to zero, removal by correction, and removal by
importance) was adequate to increase the parsimony. The results were
validated using the method “nested leave-one-out cross-validation”. The
prediction of soil attributes by machine learning algorithms yielded
adequate values for field-collected data, without any sample preparation,
for most of the tested predictors (R2 values ranging from 0.20 to
0.50). Also, the use of four regression algorithms proved to be important
since at least one of the predictors used one of the tested algorithms. The
performance values of the best algorithms for each predictor were higher
than those obtained with the use of a mean value for the entire area
comparing the values of root mean square error (RMSE) and mean absolute
error (MAE). The best combination of sensors that reached the highest model
performance was that of the gamma-ray spectrometer and the
susceptibilimeter. The most important variables for most
predictions were parent material,
digital elevation, standardized height, and magnetic susceptibility. We concluded that soil attributes can be efficiently modeled
by geophysical data using machine learning techniques and geophysical sensor
combinations. This approach can facilitate future soil mapping in a more
time-efficient and environmentally friendly manner.