2012
DOI: 10.48550/arxiv.1208.5654
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Document Clustering Evaluation: Divergence from a Random Baseline

Christopher M. De Vries,
Shlomo Geva,
Andrew Trotman

Abstract: Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The diverg… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 25 publications
(43 reference statements)
0
2
0
Order By: Relevance
“…in case of PCA-based representations). We computed the accuracy of LDA and reported the divergence from a random baseline [30] to quantify to which degree an input parameter or input representation is able to separate the underlying classes. The random baseline was estimated by the zero rule (always choosing the most frequent class in the dataset).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…in case of PCA-based representations). We computed the accuracy of LDA and reported the divergence from a random baseline [30] to quantify to which degree an input parameter or input representation is able to separate the underlying classes. The random baseline was estimated by the zero rule (always choosing the most frequent class in the dataset).…”
Section: Discussionmentioning
confidence: 99%
“…Since in different experiments the random baseline varies, the absolute values of accuracy are of limited expressiveness. To enable a fair comparison, we employ the divergence from a random baseline approach [30] and thus provide for each experiment the difference between the random baseline and the absolute classification accuracy.…”
Section: E Classificationmentioning
confidence: 99%