2020
DOI: 10.48550/arxiv.2006.13108
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Distilling Object Detectors with Task Adaptive Regularization

Ruoyu Sun,
Fuhui Tang,
Xiaopeng Zhang
et al.

Abstract: Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization. In this paper, we investigate each module of a typical detector in depth, and propose a general distillation framework that adaptively transfers knowledge from teacher to student according to th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
45
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(45 citation statements)
references
References 26 publications
0
45
0
Order By: Relevance
“…Wang et al [28] propose the finegrained mask to distill the regions calculated by groundtruth bounding boxes. Sun et al [25] utilize the Gaussian Mask to cover the ground-truth for distillation. Such methods lack the distillation for the background.…”
Section: Knowledge Distillationmentioning
confidence: 99%
See 1 more Smart Citation
“…Wang et al [28] propose the finegrained mask to distill the regions calculated by groundtruth bounding boxes. Sun et al [25] utilize the Gaussian Mask to cover the ground-truth for distillation. Such methods lack the distillation for the background.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…Mimick [15] distills the positive area proposed by region proposal network (RPN) of the student detector. FGFI [28] and TADF [25] use the fine-grained and Gaussian Mask to select the distillation area, respectively. Defeat [7] distills the foreground and background separately.…”
Section: Introductionmentioning
confidence: 99%
“…KD was first popularised for image classification [14] where a student model is trained to mimic the soft labels generated by a teacher model. However, this [10,30,34], Our approach (d) focuses on a few key predictive regions of the teacher.…”
Section: Introductionmentioning
confidence: 99%
“…While soft label-based KD can be directly applied for classification, finding an equivalent for localisation remains a challenge. Recent work [9,10,30,34,35,37,41] alleviates this problem by forcing the student model to generate feature maps similar to the teacher counterpart; a process known as feature imitation.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation