2018
DOI: 10.2478/pralin-2018-0004
|View full text |Cite
|
Sign up to set email alerts
|

Improving Topic Coherence Using Entity Extraction Denoising

Abstract: Managing large collections of documents is an important problem for many areas of science, industry, and culture. Probabilistic topic modeling offers a promising solution. Topic modeling is an unsupervised machine learning method and the evaluation of this model is an interesting problem on its own. Topic interpretability measures have been developed in recent years as a more natural option for topic quality evaluation, emulating human perception of coherence with word sets correlation scores. In this paper, w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…Such quantification helps to discriminate between collections of terms semantically related and statistical noise. For a better understanding of the metric, we refer to (Cardenas et al , 2018; Huang, 2019).…”
Section: Methodsmentioning
confidence: 99%
“…Such quantification helps to discriminate between collections of terms semantically related and statistical noise. For a better understanding of the metric, we refer to (Cardenas et al , 2018; Huang, 2019).…”
Section: Methodsmentioning
confidence: 99%
“…Though most topic models use bag-of-words representations, it is generally recognized that single words lack interpretability. Recent work by Cardenas et al (2018) also reports that an entity can improve topic coherence compared to bagof-words. In our work, we use entities that are provided by Semantic Scholar Open Corpus.…”
Section: Data Preprocessingmentioning
confidence: 99%