We compare the results of splitting batches of industrial products (semiconductor devices) into several prospective homogeneous production batches using the standard k-means and p-median clustering models with various normalization methods. In the case of clustering problems, quadratic Euclidean distances are the most popular. We use them as well as the Mahalanobis distances to calculate the differences (distances) in the normalized space of the industrial product features, and compare the clustering results with the use of the Rand index, and empirically establish the advantage of the p-median model.
The authors examine the problem of choosing the search radius for local concentrations in the FOREL-2 clustering algorithm with an initial number of clusters. Our approach was aimed at improving the accuracy and stability of the result, such as identifying homogeneous batches of industrial products. We examined the k-means and FOREL-2 algorithms by using normalized standard deviation test values and by valid parameter values for the problem of automatic classification of objects in a multi-dimensional space of measured parameters. For such problems, with the use of the FOREL-2 algorithm, we apply greedy heuristic procedures to select the radius of local concentrations. According to the obtained Rand index, the approach which uses the FOREL-2 algorithm demonstrated the best accuracy with a larger value of the objective function in comparison with the k-means algorithm. The accuracy and speed of the software implementation of the algorithm are quite acceptable for solving the problem of clustering electronic radio products based on test data. The use of greedy heuristics for choosing the radius of the search for local concentrations in the FOREL-2 clustering algorithm with a specified number of clusters has an advantage in the speed of exact clustering compared to the k-means algorithm that uses greedy heuristics for choosing centroids.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.