2012 Brazilian Symposium on Neural Networks 2012
DOI: 10.1109/sbrn.2012.25
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of External Clustering Evaluation Indices in the Context of Imbalanced Data Sets

Abstract: National audienceFor highly imbalanced data sets, almost all the instances are labeled as one class, whereas far fewer examples are labeled as the other classes. In this paper, we present an empirical comparison of seven different clustering evaluation indices when used to assess partitions generated from highly imbalanced data sets. Some of the metrics are based on matching of sets (F-measure), information theory (normalized mutual information and adjusted mutual information), and pair of objects counting (Ra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
19
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(21 citation statements)
references
References 10 publications
2
19
0
Order By: Relevance
“…For each example, the first column shows the value obtained with the metric for the left partition, the second column shows the result for the right partition and the third column indicates if the formal constraint is satisfied ( ) or Note that none of current metrics can satisfy all constraints. Indeed, F b 3 satisfies the first 4 F.C., but misses the correct identification of the best partition for the unbalanced case as reported by [4]. However, the proposed modifications F mod&0.9 b 3 (with | x| = 3) and F 0.9 b 3 manage to correctly classify all the formal constraints using the parameter α = 0.9.…”
Section: Formal Constraintsmentioning
confidence: 85%
See 3 more Smart Citations
“…For each example, the first column shows the value obtained with the metric for the left partition, the second column shows the result for the right partition and the third column indicates if the formal constraint is satisfied ( ) or Note that none of current metrics can satisfy all constraints. Indeed, F b 3 satisfies the first 4 F.C., but misses the correct identification of the best partition for the unbalanced case as reported by [4]. However, the proposed modifications F mod&0.9 b 3 (with | x| = 3) and F 0.9 b 3 manage to correctly classify all the formal constraints using the parameter α = 0.9.…”
Section: Formal Constraintsmentioning
confidence: 85%
“…Finally, Cluster size vs. quantity gives higher scores to partitions where few clusters are provided but separates most classes. In addition to these formal constraints, the Unbalanced constraint was recently added by [4] and evaluates if a misclassification is present in a big class or in a small one. This constraint gives better scores when the incorrect classified element is from the biggest class.…”
Section: Formal Constraintsmentioning
confidence: 99%
See 2 more Smart Citations
“…Hence, for our research purpose the external indices could be more robust in comparison of clustering concordance between sample and complete datasets. In their research aiming to carry out the effect of sampling, de Souto et al (2012) also preferred to use the external validity indices for assessing the partitions for highly imbalanced datasets. In our study, since we expect that the cluster densities can be changed by the sampling rates we also assumed that the external indices would be more informative in comparison of the partitions obtained on different sample datasets.…”
Section: External Validity Indices and Clustering Qualitymentioning
confidence: 99%