2009
DOI: 10.1007/s10791-009-9106-z
|View full text |Cite
|
Sign up to set email alerts
|

A comparison of extrinsic clustering evaluation metrics based on formal constraints

Abstract: There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
300
0
7

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 205 publications
(307 citation statements)
references
References 1 publication
0
300
0
7
Order By: Relevance
“…We summarize the baseline levels of performance provided by these optimal classifications in Table 1 using three statistics: pairwise precision, pairwise recall, and pairwise F-measure (Amigó, Gonzalo, Artiles, & Verdejo, 2009). Pairwise refers to the fact that the statistics are constructed by examining every pair of data points and asking whether the two are in the same class (according to either the fitted model or the ideal model).…”
Section: Resultsmentioning
confidence: 99%
“…We summarize the baseline levels of performance provided by these optimal classifications in Table 1 using three statistics: pairwise precision, pairwise recall, and pairwise F-measure (Amigó, Gonzalo, Artiles, & Verdejo, 2009). Pairwise refers to the fact that the statistics are constructed by examining every pair of data points and asking whether the two are in the same class (according to either the fitted model or the ideal model).…”
Section: Resultsmentioning
confidence: 99%
“…Clonal genotypes were obtained by inferring the occurrence of the mutation on the branches of the clonal tree. The clustering accuracy of each method was measured using adjusted rand index and B-cubed F-score (Amigó et al, 2009) for datasets without and with doublets respectively. The genotyping performance was measured using Hamming distance (number of entries differing) between the true and inferred genotypes.…”
Section: Benchmarking On Simulated Datasetsmentioning
confidence: 99%
“…Let a be the number of pairs of cells correctly partitioned into the same class by the clustering method; b be the number of pairs of cells partitioned into the same cluster but in fact belong to di erent classes; c be the number of pairs of cells partitioned into di erent clusters but belongs to the same class; and d be the number of pairs of cells correctly partitioned into di erent clusters. Then the Adjusted Rand Index [50], the Jaccard index [51], and the Fowlkes-Mallows index [52] can be de ned as + c) ; and the Purity [53] can be calculated as…”
Section: Evaluating the Stability Of Gene Listsmentioning
confidence: 99%