As one of the commonly used data mining algorithms, K-means has the advantage of fast clustering speed, but the disadvantage is that it is less effective for clustering non-spherical data. An improved K-means algorithm (IK-means) is proposed to enhance clustering efficiency for non-spherical data. The original dataset is clustered into a relatively larger number of high-density sub-clusters, and the final result is obtained by merging connected sub-clusters respectively. The connectivity among sub-clusters is evaluated by the sub-clusters density and the nearest distance class between sub-clusters. By testing on University of California, Irvine(UCI) datasets and several other artificial simulation datasets, the comparison of proposed IK-means algorithm against DBSCAN, KGFCM shows its clustering capability for data of arbitrary shape. The clustering Adjusted Rand Index (ARI) value for 72,000 sizes data is 24% higher than DBSCAN, and 95.2% higher than KGFCM. For larger datasets, the IK-means algorithm is faster than DBSCAN and KGFCM.
Clustering is an important part of artificial intelligence and is widely used in data mining, pattern recognition, natural language processing, computer vision and other aspects. Clustering algorithms are complex and difficult to master. In order to make learning easier and more interesting, a visual clustering experiment is designed. The first is the introduction of clustering algorithms and clustering indicators; then basic experiments are designed to show the working principle and clustering characteristics of each algorithm through the visualization of the clustering process and the visualization of the clustering results, and introduce the self-generated artificial data to the method to verify the characteristics of the algorithm; the last is the application experiment, through the face clustering and image segmentation experiments to improve the students' interest in learning.
CLIQUE is a clustering algorithm based on grid and density, which is sensitive to grid division parameter M and density threshold R, and has poor clustering accuracy. This paper proposes an improved algorithm, which is not sensitive to the initial grid division parameter M, and can automatically split the grid. Combined with the K-means algorithm, the clustering effect is improved. The F value on the S1 dataset is 36.9% higher than that of the classic CLIQUE algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.