Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2 2017
DOI: 10.18653/v1/e17-2070
|View full text |Cite
|
Sign up to set email alerts
|

Measuring Topic Coherence through Optimal Word Buckets

Abstract: Measuring topic quality is essential for scoring the learned topics and their subsequent use in Information Retrieval and Text classification. To measure quality of Latent Dirichlet Allocation (LDA) based topics learned from text, we propose a novel approach based on grouping of topic words into buckets (TBuckets). A single large bucket signifies a single coherent theme, in turn indicating high topic coherence. TBuckets uses word embeddings of topic words and employs singular value decomposition (SVD) and Inte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…Instead of calculating pairwise word similarities, Rosner et al (2014) proposed the partitioning of the set of top topic words into subsets and averaged a "confirmation measure" based on conditional probability over the subset pairs. Ramrakhiyani et al (2017) proposed a coherence measure based on clustering the embeddings of top topic words and approximating the coherence with the size of the largest cluster.…”
Section: Topic Coherencementioning
confidence: 99%
See 1 more Smart Citation
“…Instead of calculating pairwise word similarities, Rosner et al (2014) proposed the partitioning of the set of top topic words into subsets and averaged a "confirmation measure" based on conditional probability over the subset pairs. Ramrakhiyani et al (2017) proposed a coherence measure based on clustering the embeddings of top topic words and approximating the coherence with the size of the largest cluster.…”
Section: Topic Coherencementioning
confidence: 99%
“…More specifically, Chang et al showed that models that fare better in predictive perplexity often have less interpretable topics, suggesting that evaluation should consider the internal representation of topic models and aim to quantify their interpretability. The idea soon gave rise to a new family of methods (Newman et al, 2010;Lau et al, 2014;Röder et al, 2015;Ramrakhiyani et al, 2017) that evaluate the semantic interpretability by measuring the topic coherence. These methods assume that topic coherence correlates with the coherence of the words assigned to that topic and thus quantify topic coherence as the coherence of top-ranked topic words.…”
Section: Introductionmentioning
confidence: 99%
“…Similarly, Ramrakhiyani et al (2017) made use of the same datasets and evaluations and presented a coherence measure which is approximated with the size of the largest cluster produced from embed-dings of the top-N T ws . Human evaluation tasks have also been created to measure how representative a topic model is of the underlying T dc (Chang et al, 2009;Bhatia et al, 2017;Morstatter and Liu, 2017;Alokaili et al, 2019;Lund et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Coherence measures are commonly based on a score of mutual similarity between top topic-related words, which can be defined using a variety of word representations and similarity measures [9], [11], [42], [63], [73]- [77]. Alternate approaches include clustering of word embeddings [78] and querying search engines with top topic words [9]. In addition to topic coherence measures, alternate approaches to calculating topic quality have been proposed, based on calculating distances between topics and uninformative probability distributions [79], and on aligning model topics with WordNet concepts [80], [81].…”
Section: B Topic Model Evaluationmentioning
confidence: 99%