This paper proposes a new method called depth difference (DeD), for estimating the optimal number of clusters (k) in a dataset based on data depth. The DeD method estimates the k parameter before actual clustering is constructed. We define the depth within clusters, depth between clusters, and depth difference to finalize the optimal value of k, which is an input value for the clustering algorithm. The experimental comparison with the leading state-of-the-art alternatives demonstrates that the proposed DeD method outperforms.
Clustering is a key method in unsupervised learning with various applications in data mining, pattern recognition and intelligent information processing. However, the number of groups to be formed, usually notated as [Formula: see text] is a vital parameter for most of the existing clustering algorithms as their clustering results depend heavily on this parameter. The problem of finding the optimal [Formula: see text] value is very challenging. This paper proposes a novel idea for finding the correct number of groups in a dataset based on data depth. The idea is to avoid the traditional process of running the clustering algorithm over a dataset for [Formula: see text] times and further, finding the [Formula: see text] value for a dataset without setting any specific search range for [Formula: see text] parameter. We experiment with different indices, namely CH, KL, Silhouette, Gap, CSP and the proposed method on different real and synthetic datasets to estimate the correct number of groups in a dataset. The experimental results on real and synthetic datasets indicate good performance of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.