Summary
Given the costs of soil survey it is necessary to make the best use of available datasets, but data that differ with respect to some aspect of the sampling or analytical protocol cannot be combined simply. In this paper we consider a case where two datasets were available on the concentration of plant‐available magnesium in the topsoil. The datasets were the Representative Soil Sampling Scheme (RSSS) and the National Soil Inventory (NSI) of England and Wales. The variable was measured over the same depth interval and with the same laboratory method, but the sample supports were different and so the datasets differ in their variance. We used a multivariate geostatistical model, the linear model of coregionalization (LMCR), to model the joint spatial distribution of the two datasets. The model allowed us to elucidate the effects of the sample support on the two datasets, and to show that there was a strong correlation between the underlying variables. The LMCR allowed us to make spatial predictions of the variable on the RSSS support by cokriging the RSSS data with the NSI data. We used cross‐validation to test the validity of the LMCR and showed how incorporating the NSI data restricted the range of prediction error variances relative to univariate ordinary kriging predictions from the RSSS data alone. The standardized squared prediction errors were computed and the coverage of prediction intervals (i.e. the proportion of sites at which the prediction interval included the observed value of the variable). Both these statistics suggested that the prediction error variances were consistent for the cokriging predictions but not for the ordinary kriging predictions from the simple combination of the RSSS and NSI data, which might be proposed on the basis of their very similar mean values. The LMCR is therefore proposed as a general tool for the combined analysis of different datasets on soil properties.
Highlights
Differences in sample support mean that two datasets on a soil property cannot be combined simply.
We showed how a multivariate geostatistical model can be used to elucidate the relationships between two such datasets.
The same model allows soil properties to be mapped jointly from such data.
This offers a general basis for combining soil datasets from diverse sources