2021
DOI: 10.48550/arxiv.2111.11837
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Focal and Global Knowledge Distillation for Detectors

Abstract: Knowledge distillation has been applied to image classification successfully. However, object detection is much more sophisticated and most knowledge distillation methods have failed on it. In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the foreground and background. If we distill them equally, the uneven differences between feature maps will negatively affect the distillation. Thus, we propose Focal and Global Distil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
37
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(37 citation statements)
references
References 29 publications
0
37
0
Order By: Relevance
“…Early KD approaches for classification focus on transferring knowledge to student models by forcing their predictions to match those of the teacher [14]. More recent work [34,35,41] claims that feature imitation, i.e. forcing the intermediate feature maps of student models to match their teacher counterpart, is more effective for detection.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Early KD approaches for classification focus on transferring knowledge to student models by forcing their predictions to match those of the teacher [14]. More recent work [34,35,41] claims that feature imitation, i.e. forcing the intermediate feature maps of student models to match their teacher counterpart, is more effective for detection.…”
Section: Related Workmentioning
confidence: 99%
“…A vital challenge when performing feature imitation for dense object detectors is determining which feature regions to distil from the teacher model. Naively distilling all feature maps equally results in poor performance [10,30,35]. To solve this problem, FGFI [34] distils features that are covered by anchor boxes which have a high IoU with the GT.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations