2009
DOI: 10.1007/978-3-642-04174-7_28
|View full text |Cite
|
Sign up to set email alerts
|

Latent Dirichlet Allocation for Automatic Document Categorization

Abstract: Abstract. In this paper we introduce and evaluate a technique for applying latent Dirichlet allocation to supervised semantic categorization of documents. In our setup, for every category an own collection of topics is assigned, and for a labeled training document only topics from its category are sampled. Thus, compared to the classical LDA that processes the entire corpus in one, we essentially build separate LDA models for each category with the category-specific topics, and then these topic collections are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0
3

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 13 publications
0
4
0
3
Order By: Relevance
“…A study was conducted by Bíró and Szabó [12] using LDA for web documents classification. Instead of estimating the topics of the entire documents in the corpus, they built separate LDA models for every category.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A study was conducted by Bíró and Szabó [12] using LDA for web documents classification. Instead of estimating the topics of the entire documents in the corpus, they built separate LDA models for every category.…”
Section: Related Workmentioning
confidence: 99%
“…An alternative way to map the texts into a small number of latent topics using LDA had recently been applied to information retrieval and TC. A study was conducted by Bíró and Szabó [12] using LDA for web documents classification. Instead of estimating the topics of the entire documents in the corpus, they built separate LDA models for every category.…”
Section: Related Workmentioning
confidence: 99%
“…Balogh az ELTE mesterképzésén íródott szakdolgozatában a kuruc.info romaellenes megnyilvánulásait vizsgálta (BALOGH 2015). Bíró PhD értekezésében a rejtett Dirichlet allokáció dokumentumosztályozási lehetőségeit vizsgálta, az eredeti LDA modellt multi-korpusz (MLDA) és linkalapú LDA modellekké továbbfejlesztve (BÍRÓ 2009). A linkalapú LDA modellt Bíró és társai a webes spamszűrésben alkalmazták, ami a konkurens megoldásokhoz képest valamivel magasabb hatékonyságúnak bizonyult a tesztelés során (BÍRÓ et al 2009a).…”
Section: Kutatási Előzményekunclassified
“…However, there are various algorithms that can be adapted to your data and trained. To highlight the main topic of the text, a model based on the Latent Dirichlet Allocation (LDA) [10] algorithm was used. The main idea of this algorithm is that each document is considered as a set of topics in a certain proportion.…”
Section: The Analysis Module Of Text Subjectmentioning
confidence: 99%