2019
DOI: 10.3390/e21070660
|View full text |Cite
|
Sign up to set email alerts
|

Estimating Topic Modeling Performance with Sharma–Mittal Entropy

Abstract: Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma–Mittal entropy. We test our approach on two models: probabilistic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
28
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 29 publications
(29 citation statements)
references
References 56 publications
1
28
0
Order By: Relevance
“…Thus, topic model is described by two observable parameters: (1) the sum of probabilities of highly probable words); (2) the number of highly probable words, N. Therefore, partition function (statistical sum) of a topic model can be expressed as Z q = ρ • (qP) q , where q = 1/T [34]. Correspondingly, Renyi entropy of a topic model is expressed in terms of partition function as…”
Section: Entropy Approach For Analysis Of Topic Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, topic model is described by two observable parameters: (1) the sum of probabilities of highly probable words); (2) the number of highly probable words, N. Therefore, partition function (statistical sum) of a topic model can be expressed as Z q = ρ • (qP) q , where q = 1/T [34]. Correspondingly, Renyi entropy of a topic model is expressed in terms of partition function as…”
Section: Entropy Approach For Analysis Of Topic Modelsmentioning
confidence: 99%
“…A more detailed explanation of formulating Renyi entropy for topic models can be found in [7,34]. Application of Renyi entropy for investigation of TM results is useful due to the following reasons.…”
Section: Entropy Approach For Analysis Of Topic Modelsmentioning
confidence: 99%
“…The entropy approach to TM tuning is based primarily on computing Rényi entropy for each topic solution while varying the number of topics and hyperparameters [4,8]. For TM, the Rényi entropy is expressed as follows:…”
Section: Entropic Approach For Determining the Optimal Number Of Topicsmentioning
confidence: 99%
“…In this area, S R q reaches its minimum. It has been shown that the minimum Rényi entropy corresponds to the number of topics identified by human coders [8]. Hence, the search for the S R q minimum could, at least partly, substitute the manual labor of marking up document collections, substantially simplifying TM tuning on uncoded datasets.…”
Section: Entropic Approach For Determining the Optimal Number Of Topicsmentioning
confidence: 99%
“…Thus, when conducting sentiment analysis of such texts, it is necessary to analyze not only the themes, but also the viewpoints and positions, that is, text orientation. Moreover, in terms of the dominant research methods in sentiment analysis, the situations of "Synonymy" and "Polysemy" in natural language and the semantic correlations between vocabulary and document are often ignored in the process of modeling [4,5]. Even more, other features, such as Semantic Structure, the latent semantic information of documents, are also ignored [6].…”
Section: Introductionmentioning
confidence: 99%