2017
DOI: 10.1007/978-3-319-73013-4_6
|View full text |Cite
|
Sign up to set email alerts
|

Combining Thesaurus Knowledge and Probabilistic Topic Models

Abstract: Abstract. In this paper we present the approach of introducing thesaurus knowledge into probabilistic topic models. The main idea of the approach is based on the assumption that the frequencies of semantically related words and phrases, which are met in the same texts, should be enhanced: this action leads to their larger contribution into topics found in these texts. We have conducted experiments with several thesauri and found that for improving topic models, it is useful to utilize domain-specific knowledge… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 15 publications
0
1
0
1
Order By: Relevance
“…A noticeable number of studies are devoted to improving the parameters of topic vector models of the corpus, in which the vector representation of documents is made on a set of topics identified as a result of text analysis on the entire corpus. For example, in [22], the improvement of the topic model is performed by artificially increasing the coincidence of synonyms, and the authors of [23] introduce information about synonyms into the prior Dirichlet distribution in order to enhance the coherence of the topics. The work [24] proposed such a concept as Thesaurus-Based Topic Model and compared various topic models.…”
Section: Document Similarity Thresholdmentioning
confidence: 99%
“…A noticeable number of studies are devoted to improving the parameters of topic vector models of the corpus, in which the vector representation of documents is made on a set of topics identified as a result of text analysis on the entire corpus. For example, in [22], the improvement of the topic model is performed by artificially increasing the coincidence of synonyms, and the authors of [23] introduce information about synonyms into the prior Dirichlet distribution in order to enhance the coherence of the topics. The work [24] proposed such a concept as Thesaurus-Based Topic Model and compared various topic models.…”
Section: Document Similarity Thresholdmentioning
confidence: 99%
“…Н. Лукашевич с соавторами [7] предлагают метод решения задачи тематического моделирования, который дополняет наборы близких тем терминами из тезауруса, созданного вручную. Авторы провели несколько экспериментов с алгоритмами без тезауруса, с синонимами и с синонимами и гиперонимами, используя для измерения качества метрику уникальности ядра.…”
Section: обзор смежных работunclassified