2021
DOI: 10.1109/access.2021.3109425
|View full text |Cite
|
Sign up to set email alerts
|

A Topic Coverage Approach to Evaluation of Topic Models

Abstract: Topic models are widely used unsupervised models capable of learning topics -weighted lists of words and documents -from large collections of text documents. When topic models are used for discovery of topics in text collections, a question that arises naturally is how well the model-induced topics correspond to topics of interest to the analyst. In this paper we revisit and extend a so far neglected approach to topic model evaluation based on measuring topic coverage -computationally matching model topics wit… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 69 publications
(156 reference statements)
0
5
0
Order By: Relevance
“…The quality of topics generated by topic models can be assessed using evaluation metrics defined to quantify various topics, such as correlation, similarity, saliency, and relevance. Based on extensive research, various metrics and techniques have been devised to evaluate the topics produced by topic models and determine the optimal number of topics over the last two decades [36], [37].…”
Section: Evaluate Generated Lda Models On Metric Setmentioning
confidence: 99%
“…The quality of topics generated by topic models can be assessed using evaluation metrics defined to quantify various topics, such as correlation, similarity, saliency, and relevance. Based on extensive research, various metrics and techniques have been devised to evaluate the topics produced by topic models and determine the optimal number of topics over the last two decades [36], [37].…”
Section: Evaluate Generated Lda Models On Metric Setmentioning
confidence: 99%
“…To obtain the prominent topics or nent words within the inspected documents, topic modeling was implemented t the Orange TM toolkit. It was used to analyze and find the frequency of words in cles' abstracts, thus providing an indication about relevance to the inspected doma lowing Korenčcić et al [41], the LDA unsupervised learning algorithm, which is the most popular topic modeling methods, was used. The algorithm is designed lyze a large volume of unlabeled text.…”
Section: Topic Modeling and Word Cloudmentioning
confidence: 99%
“…The topic modeling procedure was used to analyze two types of document corpora: scientific papers and expert opinion columns. The analysis was performed on both repositories in the same manner: following Korenčić et al [23], the latent Dirichlet allocation (LDA) unsupervised learning algorithm was used to analyze a large volume of unlabeled text through the evaluation of clusters of words that frequently occur together [24,25], under the assumption that similar words appearing in similar contexts represent the same topic. Subsequently, the scientific corpus and the expert opinion columns' content were analyzed using the following methods: (1) A search for paragraphs containing words provided by the Merriam-Webster online thesaurus lexicon (https://www.merriam-webster.com/thesaurus, accessed on 29 December 2021).…”
Section: Topic Modeling and Sentiment Analysismentioning
confidence: 99%