Cluster analysis refers to the process of grouping a collection of physical or abstract objects into multiple classes of similar objects. Determining the optimal classification number of a data set is the key to the clustering problem, that is to say whether the data set can be effectively partitioned. Cluster validity study is a process of establishing clustering effectiveness indicators, evaluating clustering quality and determining the optimal number of clusters. A validity function of fuzzy C-means (FCM) clustering algorithm is proposed by adopting the division of intra-class compactness and inter-class separation, whose minimum represents the best clustering. Then, the proposed validity function on FCM clustering algorithm is compared with the known typical validity functions by carrying out simulation experiments to compare the related clustering performance. Three data sets are adopted to carry out FCM clustering, which includes three classical data sets, two artificial data sets and six real data sets in UCI database. Simulation experimental results show that the proposed validity function can effectively partition the data set.INDEX TERMS Clustering analysis, clustering validity index, fuzzy C-means clustering algorithm.
Clustering as an unsupervised learning method is a process of dividing a data object or observation object into a subset, that is to classify the data through observation learning instead of example learning without the guidance of the prior class label information. Bat algorithm (BA) is a swarm intelligence optimization algorithm inspired by bat's ultrasonic echo localization foraging behavior, but it has the disadvantages of being easily trapped into local minima and not being highly accurate. So an improved bat algorithm was proposed. In the global search, a Gaussian-like convergence factor is added, and five different convergence factors are proposed to improve the global optimization ability of the algorithm. In the local search, the hunting mechanism of the whale optimization algorithm (WOA) and the sine position updating strategy are adopted to improve the local optimization ability of the algorithm. This paper compares the clustering effect of the improved bat algorithm with bat algorithm, flower pollination algorithm (FPA), harmony search (HS) algorithm, whale optimization algorithm and particle swarm optimization (PSO) algorithm on seven real data sets under six different convergence factors. The simulation results show that the clustering effect of the improved bat algorithm is superior to other intelligent optimization algorithms.
Clustering validity function is an index used to judge the accuracy of clustering results. At present, most studies on clustering validity are based on single clustering validity function. Research shows that no clustering validity function can handle any data and always perform better than other indexes. Therefore, a hybrid weighted combination evaluation method based on fuzzy C-means (FCM) clustering validity functions was proposed. The weighting method combines expert weighting with information entropy weighting to improve the subjective factor influence of expert weighting and the shortcoming of information entropy weighting in the value judgment of each clustering validity function. Four clustering validity function combination methods of linear, exponential, logarithm and proportion was studied. Finally, the proposed fuzzy clustering validity evaluation method is verified by experiments on artificial data sets and UCI data sets. The experimental results show that the proposed fuzzy clustering validity evaluation method can overcome the shortcoming of single clustering validity function, and can get the optimal clustering number more accurately for different data sets.
Fuzzy C-means (FCM) clustering algorithm is a widely used method in data mining. However, there is a big limitation that the predefined number of clustering must be given. So it is very important to find an optimal number of clusters. Therefore, a new validity function of FCM clustering algorithm is proposed to verify the validity of the clustering results. This function is defined based on the intra-class compactness and inter-class separation from the fuzzy membership matrix, the data similarity between classes and the geometric structure of the data set, whose minimum value represents the optimal clustering partition result. The proposed clustering validity function and seven traditional clustering validity functions are experimentally verified on four artificial data sets and six UCI data sets. The simulation results show that the proposed validity function can obtain the optimal clustering number of the data set more accurately, and can still find the more accurate clustering number under the condition of changing the fuzzy weighted index, which has strong adaptability and robustness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.