We propose an algorithm for clustering short text fragments such as lists of ingredients in food products using the k-means algorithm in combination with the molecular chemical reactions algorithm. The solution obtained with k-means is used as an initial molecular structure solution to optimize chemical reactions in combination with k-means by generating new solutions using the four main types of chemical reactions. The computational experiment showed the comparative efficiency of new algorithm, taking into account numerical metrics: accuracy, precision, recall, and F-measure.
Finding the cluster structure is essential for analyzing self-organized networking structures, such as social networks. In such problems, a wide variety of distance measures can be used. Common clustering methods often require the number of clusters to be explicitly indicated before starting the process of clustering. A preliminary step to clustering is deciding, firstly, whether the data contain any clusters and, secondly, how many clusters the dataset contains. To highlight the internal structure of data, several methods for visual assessment of clustering tendency (VAT family of methods) have been developed. The vast majority of these methods use the Euclidean distance or cosine similarity measure. In our study, we modified the VAT and iVAT algorithms for visual assessment of the clustering tendency with a wide variety of distance measures. We compared the results of our algorithms obtained from both samples from repositories and data from applied problems.
Automatic grouping (clustering) involves dividing a set of objects into subsets (groups) so that the objects from one subset are more similar to each other than to the objects from other subsets according to some criterion. Kohonen neural networks are a class of artificial neural networks, the main element of which is a layer of adaptive linear adders, operating on the principle of “winner takes all”. One of the advantages of Kohonen networks is their ability of online clustering. Greedy agglomerative procedures in clustering consistently improve the result in some neighborhood of a known solution, choosing as the next solution the option that provides the least increase in the objective function. Algorithms using the agglomerative greedy heuristics demonstrate precise and stable results for a k-means model. In our study, we propose a greedy agglomerative heuristic algorithm based on a Kohonen neural network with distance measure variations to cluster industrial products. Computational experiments demonstrate the comparative efficiency and accuracy of using the greedy agglomerative heuristic in the problem of grouping of industrial products into homogeneous production batches.
A number of real-world problems of automatic grouping of objects or clustering require a reasonable solution and the possibility of interpreting the result. More specific is the problem of identifying homogeneous subgroups of objects. The number of groups in such a dataset is not specified, and it is required to justify and describe the proposed grouping model. As a tool for interpretable machine learning, we consider formal concept analysis (FCA). To reduce the problem with real attributes to a problem that allows the use of FCA, we use the search for the optimal number and location of cut points and the optimization of the support set of attributes. The approach to identifying homogeneous subgroups was tested on tasks for which interpretability is important: the problem of clustering industrial products according to primary tests (for example, transistors, diodes, and microcircuits) as well as gene expression data (collected to solve the problem of predicting cancerous tumors). For the data under consideration, logical concepts are identified, formed in the form of a lattice of formal concepts. Revealed concepts are evaluated according to indicators of informativeness and can be considered as homogeneous subgroups of elements and their indicative descriptions. The proposed approach makes it possible to single out homogeneous subgroups of elements and provides a description of their characteristics, which can be considered as tougher norms that the elements of the subgroup satisfy. A comparison is made with the COBWEB algorithm designed for conceptual clustering of objects. This algorithm is aimed at discovering probabilistic concepts. The resulting lattices of logical concepts and probabilistic concepts for the considered datasets are simple and easy to interpret.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.