A soil quality database (SQDB) is a collection of soil samples described by a given set of parameters, allowing farmers, scientists and other stakeholders to make informed decisions about practices, processes and policies for soil use and management. If each parameter is considered as a dimension of the space spanned by the SQDB, extracting information becomes a difficult task when the number of parameters is >3. A widely used approach to explore multidimensional data sets is the self‐organizing map (SOM) method, which is suitable for clustering, visualization and extraction of information from multidimensional data. We applied the SOM method as an exploratory technique to an unlabelled SQDB to extract knowledge – data patterns and data associations – from the data set (the time and location of each sample were unknown). The SQDB used in this study is a set of 1240 unlabelled samples within the Central Valley of Chile, covering ca 7500 km2. The predominant soils are Andisols with a large organic matter content (7–12%), small bulk densities (0.6–1.0 g/cm3) and large water‐holding capacity. We identified three patterns: (i) isolated region within the map with close neurons (smooth transitions), (ii) two or more regions with predominantly large or small values and (iii) homogeneous map with small values with an isolated region of large values. These patterns show that the data set represented more than two groups that were not necessarily related. For pH, no important associations with other investigated parameters were observed. Previous studies carried out by the local agricultural research station showed that pH values below 5.5 constrain nutrient uptake. Thus, locations presenting pH<5,5 should be subject to seasonal monitoring to assess management practices that mitigate soil acidity. The component plane for organic matter indicates that ca. 50% of the soil samples had contents <8% related to soil series characteristics and management practices. As the k‐means is initialized by random partitions, the two‐step approach (clustering the map representing the input data) is less sensitive to variations in the input data (subsamples) than is the direct application of k‐means to the input data, but it also reduces the computational cost. The ability of SOMs to visualize multidimensional data sets helps gain an understanding of the data in the exploratory phase, such as the association and integration of physical, chemical and biological parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.