2022
DOI: 10.48550/arxiv.2204.03475
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…Furthermore, we can mix both the hard and soft target distillation by setting α to be between 0 and 1. We can also mix soft target distillation with supervised training by using LRNNT on labeled data and LKL on unlabeled data, which is used in existing distillation work in other domains [4,26,27]. In this paper, most of the experiments use α = 0 because we found that using LKL alone achieves better WERs, as shown in Section 4.3.…”
Section: Distillation Methodsmentioning
confidence: 95%
“…Furthermore, we can mix both the hard and soft target distillation by setting α to be between 0 and 1. We can also mix soft target distillation with supervised training by using LRNNT on labeled data and LKL on unlabeled data, which is used in existing distillation work in other domains [4,26,27]. In this paper, most of the experiments use α = 0 because we found that using LKL alone achieves better WERs, as shown in Section 4.3.…”
Section: Distillation Methodsmentioning
confidence: 95%
“…Different from the result that using only teacher's prediction (i.e. α kd = ∞) can achieve top results [33], ground-truth is still needed in GenSCL. A teacher classifier with better performances could potentially improve results better.…”
Section: Kd Teacher Relative Weightmentioning
confidence: 94%
“…In addition to cross-entropy, Kullback-Leibler divergence between student's prediction and teacher's prediction can boost the training of the student. A recent study [33] has shown that the training scheme based on modern tricks and knowledge distillation can stably achieve top results on cross-entropy. In contrastive learning paradigm, contrastive representation distillation [39] has shown that using teacher's representions to guide teacher model Figure 2: Generalized supervised contrastive learning framework, GenSCL.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations