2022
DOI: 10.1007/s10994-022-06228-2
|View full text |Cite
|
Sign up to set email alerts
|

Speeding-up one-versus-all training for extreme classification via mean-separating initialization

Abstract: In this paper, we show that a simple, data dependent way of setting the initial vector can be used to substantially speed up the training of linear one-versus-all classifiers in extreme multi-label classification (XMC). We discuss the problem of choosing the initial weights from the perspective of three goals. We want to start in a region of weight space (a) with low loss value, (b) that is favourable for second-order optimization, and (c) where the conjugate-gradient (CG) calculations can be performed quickly… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(2 citation statements)
references
References 26 publications
0
2
0
Order By: Relevance
“…This way, assuming logits are close to zero when we begin training, the model will assign probability ≈ k n to each label instead of ≈ 1 2 . A similar bias initialisation idea for MLC was discussed in (Schultheis and Babbar 2022), but it was not used in a neural network.…”
Section: Model Setupmentioning
confidence: 99%
“…This way, assuming logits are close to zero when we begin training, the model will assign probability ≈ k n to each label instead of ≈ 1 2 . A similar bias initialisation idea for MLC was discussed in (Schultheis and Babbar 2022), but it was not used in a neural network.…”
Section: Model Setupmentioning
confidence: 99%
“…However, further study is required to analyze its theoretical features, optimize hyperparameters, and test its scalability to more expansive networks. [19] presents an approach to accelerate deep learning training using adaptive conjugate gradient (ACG) optimization. The paper introduces an ACG optimization algorithm for deep learning training, incorporating adaptive learning rates and conjugate gradient directions to update network weights.…”
Section: Related Workmentioning
confidence: 99%