Accurately clustering large, high dimensional datasets is a challenging problem in unsupervised learning. K-means is considered to be a fast, widely used and accurate centroid based data partitioning algorithm for spherical datasets. However, its non-determinism and heavy dependence on the selection of initial cluster centers along with vulnerability to noise make it a poor candidate for clustering large datasets with high dimensionality. To overcome these, we develop a novel, nature inspired, centroid based clustering algorithm, inspired from the principles of particle physics. Our method ensures that the convergence to local optima and non-deterministic outputs are avoided. We experiment the method on large datasets of human face images. Besides, our method addresses the problem of outliers and presence of not well-separated data in these datasets. We use a deep learning model for extracting facial features into a vector of 128 dimensions. We validate the quality and accuracy of our methods using different statistical parameters like f-measure, accuracy, error rate, average in group proportion and normalized cluster size rand index. These evaluations show that our method exhibits better accuracy and quality in clustering large face image datasets, in comparison with other existing mechanisms. The strength of our algorithms is more visible as the size of the dataset grows.
INDEX TERMSCentroid Based Celestial Clustering, Face clustering on large datasets, Optimization on clustering, Refined Celestial PSO Clustering