2019
DOI: 10.36505/exling-2016/07/0025/000284
|View full text |Cite
|
Sign up to set email alerts
|

Automatic assignment of labels in Topic Modelling for Russian Corpora

Abstract: The main goal of this paper was to improve topic modelling algorithms by introducing automatic topic labelling, a procedure which chooses a label for a cluster of words in a topic. Topic modelling is a widely used statistical technique which allows to reveal internal conceptual organization of text corpora. We have chosen an unsupervised graph-based method and elaborated it with regard to Russian. The proposed algorithm consists of two stages: candidate generation by means of PageRank and morphological filters… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0
1

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 3 publications
0
4
0
1
Order By: Relevance
“…Our recent investigations are aimed to fill in the gap. In this study we used an ensemble of two graph-based methods using outer sources for candidate labels generation [24]: a) candidate labels extraction from Yandex search engine with their further ranking by TextRank (this is a graph-based model that takes into account the value of the each graph's vertex depending on how many links it forms) [25,26]; b) candidate labels extraction from Wikipedia by operations on word vector representations in Explicit Semantic Analysis (ESA) model [27,28]. These procedures comprise two stages: candidate extraction and ranking.…”
Section: Our Approach To Topic Labellingmentioning
confidence: 99%
See 1 more Smart Citation
“…Our recent investigations are aimed to fill in the gap. In this study we used an ensemble of two graph-based methods using outer sources for candidate labels generation [24]: a) candidate labels extraction from Yandex search engine with their further ranking by TextRank (this is a graph-based model that takes into account the value of the each graph's vertex depending on how many links it forms) [25,26]; b) candidate labels extraction from Wikipedia by operations on word vector representations in Explicit Semantic Analysis (ESA) model [27,28]. These procedures comprise two stages: candidate extraction and ranking.…”
Section: Our Approach To Topic Labellingmentioning
confidence: 99%
“…The algorithm for candidate labels extraction from Yandex search engine (Labels-Yandex) [25,26] is an elaboration of the procedure originally designed for visual information processing [13]. At the stage of candidate extraction the first 10 topical words for each topic form a separate query to Yandex.…”
Section: Topic Labelling Using Yandexmentioning
confidence: 99%
“…Topic modeling belongs to natural language processing, and it identifies patterns in a text [48]. We employed latent Dirichlet allocation (LDA) in this study, which is a commonly used topic modeling approach.…”
Section: Topic Modelingmentioning
confidence: 99%
“…Despite the evident necessity of integration of the two logics, it is rarely found also for other inflective languages; we see this logic explicitly employed by only one group working in Slovenian (see, e.g., Maucěc et al 2004, and later works). Beside this, several works by computer linguists have suggested decisions for the Russian language, including adding automated labeling to Russian-language topics (Mirzagitova and Mitrofanova 2016) and showing the possibility of domain term extraction by topic modeling (Bolshakova et al 2013). Automatic topic labeling by a single word or phrase is expected to ease topic interpretation; working upon it continued in the recent years by comparing quality of two labeling algorithms, namely the vector-based Explicit Semantic Analysis (ESA) and graph-based method, with the former one preferred by the authors (Kriukova et al 2018).…”
Section: Computer-linguistic Approaches To Topic Modelingmentioning
confidence: 99%