Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/361
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating Extreme Classification via Adaptive Feature Agglomeration

Abstract: Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of features, and is e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 2 publications
1
7
0
Order By: Relevance
“…A straightforward idea would be to take an embedding style approach to make the loss function (s, ŝ) = s − ŝ 2 2 . This is intuitive and matches prior approaches to XML [28,30]. However, we found that such an approach resulted in no learning and degenerate random-guessing performance.…”
Section: Dense Label Representationssupporting
confidence: 86%
See 1 more Smart Citation
“…A straightforward idea would be to take an embedding style approach to make the loss function (s, ŝ) = s − ŝ 2 2 . This is intuitive and matches prior approaches to XML [28,30]. However, we found that such an approach resulted in no learning and degenerate random-guessing performance.…”
Section: Dense Label Representationssupporting
confidence: 86%
“…Our use of extreme multi-label classification problems is due to it being out-of-reach of current HRR methods, which we found produced random-guessing performance in all cases. There exists a rich literature of XML methods that tackle the large output space from the perspective of decision trees/ensembles [23][24][25][26][27], label embedding regression [28][29][30][31], naive bayes [32], and linear classifiers [33,34]. There also exist deep learning XML methods that use either a fully-connected output layer [35] and others that use a variety of alternative approaches to dealing with the large output space [36][37][38][39][40][41][42][43].…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, a central challenge in XMC is to build classifiers which retain the accuracy of one-vs-rest paradigm while being as efficiently trainable as the tree-based methods. Recently, there have been efforts for speeding up the training of existing classifiers by better initialization and exploiting the problem structure (Fang et al 2019;Liang et al 2018;Jalan et al 2019). In a similar vein, a recently proposed tree-based method, Parabel (Prabhu et al 2018), partitions the label space recursively into two child nodes using 2-means clustering.…”
Section: Related Workmentioning
confidence: 99%
“…Apart from the class of methods mentioned above, label-embedding approaches assume that, despite the large number of labels, the label matrix is effectively low rank and therefore project it to a low-dimensional sub-space [19,33,42] . In some of the works, it was argued that the low rank embedding may be insufficient for capturing the label diversity in XMC settings ( [7,36]), which has been questioned in the recent work [16].…”
Section: Application To Other Algorithmsmentioning
confidence: 99%