2018
DOI: 10.5539/cis.v11n4p77
|View full text |Cite
|
Sign up to set email alerts
|

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

Abstract: Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…The pre-processing operations were performed, which included cleaning the text from the characters and numbers, removing the stop signs and special symbols, removing the insignificant words, then reconfiguration data for representation unstructured data in a bag of words (corpus ). The second step is training the data by using two methods from topic modeling (LSA, LDA) method with different number of topics (10,15,20) in order to see which one will give the best performance from these two methods with our database. The keywords for each topic are used as features to classify the books.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The pre-processing operations were performed, which included cleaning the text from the characters and numbers, removing the stop signs and special symbols, removing the insignificant words, then reconfiguration data for representation unstructured data in a bag of words (corpus ). The second step is training the data by using two methods from topic modeling (LSA, LDA) method with different number of topics (10,15,20) in order to see which one will give the best performance from these two methods with our database. The keywords for each topic are used as features to classify the books.…”
Section: Resultsmentioning
confidence: 99%
“…(Corpus, dictionary, and a number of topics) are needed to train the (LDA and LSA) model, where each word in the corpus of vocabulary is then connected with one or more topics with a probability, as estimated by the model. (LDA, LSA) model is built with (10,15,20) various topics where each topic is a mixture of keywords and each keyword contributes a certain weight to the topic.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation