2018
DOI: 10.1007/978-3-319-98539-8_12
|View full text |Cite
|
Sign up to set email alerts
|

Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification

Abstract: Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-ofthe-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(10 citation statements)
references
References 14 publications
0
9
0
1
Order By: Relevance
“…A few Bayesian methods have also been proposed (He et al 2012;Kapoor et al 2012;Jain et al 2017;Gaure et al 2017;Papanikolaou and Tsoumakas 2018), however they cannot be trained on XML datasets and/or they attain poor performance comparing to stateof-the art methods. Our method bears similarities with most of those works.…”
Section: Related Workmentioning
confidence: 99%
“…A few Bayesian methods have also been proposed (He et al 2012;Kapoor et al 2012;Jain et al 2017;Gaure et al 2017;Papanikolaou and Tsoumakas 2018), however they cannot be trained on XML datasets and/or they attain poor performance comparing to stateof-the art methods. Our method bears similarities with most of those works.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, there has been interest in developing distributed linear methods [7,43] which can exploit distributed hardware. From a probabilistic view-point, bayesian approaches for multi-label classification have been developed in recent works such as [21,18] and Labeled LDA [30].…”
Section: Predictive Performance 5 Related Workmentioning
confidence: 99%
“…This is a disadvantage because most news publications have document "tags" (or "labels", loosely speaking, not to be confused with dataset label) that works as a topic, for example, an article about particular flood can have "flood", "disaster", and some tags indicating city location as well (such as "Jakarta"). These tags are valuable and can be used as additional supervision for the LDA, providing a multi-label learning that is explored by many authors [20], [58], [59]. With the introduction of tags as label, the unsupervised nature of LDA becomes supervised in Labeled LDA.…”
Section: Increasing Model Generalizability With Topic Modelingmentioning
confidence: 99%
“…The objective for ATM is to provide topic modeling tool while also solves the memory requirement of LLDA when dealing with a very large number of tags, without sacrificing the coherence of the produced topic sets. LLDA posits a single topic-word distribution for each unique tag (label) that it found in the document, leading to a huge memory requirement for very large number (more than 10,000) of tags, on which case can be considered as an extreme multi-label classification problem [58].…”
Section: Analysis Of the Topic And Event Space: Tying Themes To Geospmentioning
confidence: 99%