In many clustering algorithms such as K-means and FCM, the cluster number K needs to be known beforehand. In this paper, we propose a new method to determine the cluster number without clustering for every K in K-means. We introduce a new statistics RVR (ratio of variance to range) and conduct Monte Carlo analysis of its characteristics. Based on the RVR, we propose an algorithm to determine the cluster number K and perform clustering utilizing it. We evaluate its effectiveness by performing a simulation test with different types of datasets; first, with real datasets, whose real number of clusters and components are known and second, with synthetic datasets. We observe a significant improvement in speed and quality of determining the cluster number and therefore clustering. Finally, we hope the proposed algorithm to be used efficiently and widely for clustering of multidimensional data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.