Abstract:The visible and near-infrared (VNIR) spectroscopy prediction model is an effective tool for the prediction of soil organic matter (SOM) content. The predictive accuracy of the VNIR model is highly dependent on the selection of the calibration set. However, conventional methods for selecting the calibration set for constructing the VNIR prediction model merely consider either the gradients of SOM or the soil VNIR spectra and neglect the influence of environmental variables. However, soil samples generally present a strong spatial variability, and, thus, the relationship between the SOM content and VNIR spectra may vary with respect to locations and surrounding environments. Hence, VNIR prediction models based on conventional calibration set selection methods would be biased, especially for estimating highly spatially variable soil content (e.g., SOM). To equip the calibration set selection method with the ability to consider SOM spatial variation and environmental influence, this paper proposes an improved method for selecting the calibration set. The proposed method combines the improved multi-variable association relationship clustering mining (MVARC) method and the Rank-Kennard-Stone (Rank-KS) method in order to synthetically consider the SOM gradient, spectral information, and environmental variables. In the proposed MVARC-R-KS method, MVARC integrates the Apriori algorithm, a density-based clustering algorithm, and the Delaunay triangulation. The MVARC method is first utilized to adaptively mine clustering distribution zones in which environmental variables exert a similar influence on soil samples. The feasibility of the MVARC method is proven by conducting an experiment on a simulated dataset. The calibration set is evenly selected from the clustering zones and the remaining zone by using the Rank-KS algorithm in order to avoid a single property in the selected calibration set. The proposed MVARC-R-KS approach is applied to select a calibration set in order to construct a VNIR prediction model of SOM content in the riparian areas of the Jianghan Plain in China. Results indicate that the calibration set selected using the MVARC-R-KS method is representative of the component concentration, spectral information, and environmental variables. The MVARC-R-KS method can also select the calibration set for constructing a VNIR model of SOM content with a relatively higher-fitting degree and accuracy by comparing it to classical calibration set selection methods.