Proceedings of the 2014 SIAM International Conference on Data Mining 2014
DOI: 10.1137/1.9781611973440.96
|View full text |Cite
|
Sign up to set email alerts
|

Density-Based Clustering Validation

Abstract: One of the most challenging aspects of clustering is validation, which is the objective and quantitative assessment of clustering results. A number of different relative validity criteria have been proposed for the validation of globular, clusters. Not all data, however, are composed of globular clusters. Density-based clustering algorithms seek partitions with high density areas of points (clusters, not necessarily globular) separated by low density areas, possibly containing noise objects. In these cases rel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
112
0
6

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 178 publications
(118 citation statements)
references
References 32 publications
0
112
0
6
Order By: Relevance
“…In [18,10,15] it is compared with a set of other internal measures and proven to be one of the most effective and generally applicable measures. However, when Silhouette is applied in the evaluation of k-means clustering validity, many more extra calculations are required, and the extra calculations increase following a power law corresponding to the size of the dataset, because the calculation of the Silhouette index is based on the full pairwise distance matrix over all data.…”
Section: Introductionmentioning
confidence: 99%
“…In [18,10,15] it is compared with a set of other internal measures and proven to be one of the most effective and generally applicable measures. However, when Silhouette is applied in the evaluation of k-means clustering validity, many more extra calculations are required, and the extra calculations increase following a power law corresponding to the size of the dataset, because the calculation of the Silhouette index is based on the full pairwise distance matrix over all data.…”
Section: Introductionmentioning
confidence: 99%
“…2) max{c j (a), c j (c), d(a, c)} = c j (c). Analogously to the previous case, from (18) we get (21) , c). In this case, we get from Inequality (18) that d(a, c) ≥ d(a, b), which is a contradiction to Inequality (16)!…”
Section: The Rng Wrt Mutual Reachability Distancementioning
confidence: 73%
“…A larger range of HDBSCAN* solutions for a multiple values of mpts values offers greater insight into a dataset, also providing additional opportunities for exploratory data analysis. For instance, using internal cluster validation measures such as DBCV [21], one can identify promising density levels from different hierarchies, produced from different parametric density estimates (based on mpts).…”
Section: Introductionmentioning
confidence: 99%
“…Halkidi and Vazirgiannis () introduced CDbw, an internal criterion taking density into account, and report that it outperforms five other criteria, including S_Dbw and DB for DBScan and a version of Chameleon , according to ARI. Moulavi, Jaskowiak, Campello, Zimek, and Sander () introduced a criterion they call Density Based Clustering Validation (DBCV) and compared it to Silhouette, CH, Dunn, scriptI and CDbw on four artificial data sets exhibiting challenging geometry, three gene‐expression data sets, and four UCI data sets. The clustering techniques used were DBScan , OPTICS‐Autocluster , and H DBScan .…”
Section: Configuration/parameterizationmentioning
confidence: 99%