2017
DOI: 10.46298/arima.3102
|View full text |Cite
|
Sign up to set email alerts
|

Arabic topic identification based on empirical studies of topic models

Abstract: This paper focuses on the topic identification for the Arabic language based on topic models. We study the Latent Dirichlet Allocation (LDA) as an unsupervised method for the Arabic topic identification. Thus, a deep study of LDA is carried out at two levels: Stemming process and the choice of LDA hyper-parameters. For the first level, we study the effect of different Arabic stemmers on LDA. For the second level, we focus on LDA hyper-parameters α and β and their impact on the topic identification. This study … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0
1

Year Published

2018
2018
2024
2024

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 15 publications
0
3
0
1
Order By: Relevance
“…토픽 수 및 LDA 파라미터(parameter)가 이용되며 분석 시 연구자가 입력해야 한다. 본 연구에서는 LDA 파라미터를 선행연구[ 28 ]에 근거하여 사전 확률분포 α를 0.1, 토픽 내 사전확률분포 β를 0.01, 반복 수행 횟수를 1,000회로 설정하여 실시하였다. 토픽의 수는 통계적인 방법과 해석적인 방법을 사용하여 결정하였다.…”
Section: Methodsunclassified
“…토픽 수 및 LDA 파라미터(parameter)가 이용되며 분석 시 연구자가 입력해야 한다. 본 연구에서는 LDA 파라미터를 선행연구[ 28 ]에 근거하여 사전 확률분포 α를 0.1, 토픽 내 사전확률분포 β를 0.01, 반복 수행 횟수를 1,000회로 설정하여 실시하였다. 토픽의 수는 통계적인 방법과 해석적인 방법을 사용하여 결정하였다.…”
Section: Methodsunclassified
“…12 In the LDA analysis, the researcher should set the α and β parameters and the number of iterative implementations in advance. The LDA parameters α and β have various values in many studies, but according to Naili and colleagues, 16 the α value is between 0.1 and 0.01, and the β value is 0.01. The α value represents the distribution of topics per document.…”
Section: Topic Modelingmentioning
confidence: 99%
“…M. Naili et al [28] studied topic identification of Arabic texts using LDA. They showed that applying a stemmer increases the performance of topic identification.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The topic modeling related works are organized by approaches such as topic modeling (unsupervised) [13,14,28,[33][34][35][36], seeded topic modeling (semi-supervised) [29][30][31][32], and supervised topic modeling [17,[37][38][39].…”
Section: Literature Reviewmentioning
confidence: 99%