Abstract. Clustering as a formal, systematic subject in dissertations can be considered the most influential unsupervised learning problem; so, as every other problem of this kind, it deals with finding the structure in a collection of unlabeled data. One of the matters associated with this subject is undoubtedly determination of the number of clusters. In this chapter, an efficient grouping genetic algorithm is proposed under the circumstances of an anonymous number of clusters. Concurrent clustering with different number of clusters is implemented on the same data in each chromosome of grouping genetic algorithm in order to discern the accurate number of clusters. In subsequent iterations of the algorithm, new solutions with different clusters number or distinct accuracy of clustering are produced by application of efficient crossover and mutation operators that led to significant improvement of clustering. Furthermore, a local search by a special probability is applied in each chromosome of each new population in order to increase the accuracy of clustering.These special operators will lead to the successful application of the proposed method in the big data analysis. To prove the accuracy and the efficiency of the algorithm, its tested on various artificial and real data sets in a comparable manner. Most of the datasets consisted of overlapping clusters, but the algorithm could detect the proper number of all data sets with high accuracy of clustering. The consequences make the best evidence of the algorithms successful performance of finding an appropriate number of clusters and accomplishment of the best clusterings quality in comparison with others.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.