2022
DOI: 10.1007/978-3-031-20077-9_8
|View full text |Cite
|
Sign up to set email alerts
|

Prediction-Guided Distillation for Dense Object Detection

Abstract: Real-world object detection models should be cheap and accurate. Knowledge distillation (KD) can boost the accuracy of a small, cheap detection model by leveraging useful information from a larger teacher model. However, a key challenge is identifying the most informative features produced by the teacher for distillation. In this work, we show that only a very small fraction of features within a groundtruth bounding box are responsible for a teacher's high detection performance. Based on this, we propose Predi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…Traditional object detection entails the task of identifying and localizing all objects within an image or video frame, encompassing the simultaneous duties of classification and spatial localization. Recently, the successful application of knowledge distillation in traditional object detection has garnered attention [2,16,30,33,36]. In the pursuit of compact and efficient object detection networks, [3] have seamlessly integrated knowledge distillation, achieving heightened efficiency with minimal accuracy trade-offs.…”
Section: Knowledge Distillation In Object Detectionmentioning
confidence: 99%
“…Traditional object detection entails the task of identifying and localizing all objects within an image or video frame, encompassing the simultaneous duties of classification and spatial localization. Recently, the successful application of knowledge distillation in traditional object detection has garnered attention [2,16,30,33,36]. In the pursuit of compact and efficient object detection networks, [3] have seamlessly integrated knowledge distillation, achieving heightened efficiency with minimal accuracy trade-offs.…”
Section: Knowledge Distillation In Object Detectionmentioning
confidence: 99%
“…Therefore, we would like to use the knowledge of features of defects in gold tools and insulators as a focus for the instructor network to guide the student network. Therefore, the PGW (Prediction-Guided Weighting) (Yang et al, 2022) module is introduced to improve the prospect distillation region. And the PGW module is precisely concentrated in the first k feature pixels with the highest mass fraction in the prospect region.…”
Section: Knowledge Distillation Guided By Key Area Scoringmentioning
confidence: 99%
“…At present, some researches have applied the knowledge distillation method to the field of electric power. Literature (Yang et al, 2022) proposes a compression and integration application method based on knowledge distillation. In this method, the Detr model is used to identify the initial target, and the Deformable Detr algorithm is used to compress the Detr model, so that the compression ratio reaches 87.5% and the target detection accuracy is maintained at a high level, and the effective integrated application of the target detection model in the substation inspection robot body is realized.…”
Section: Introductionmentioning
confidence: 99%
“…Knowledge Distillation, first introduced by Bucila et al [11] and popularized by [12], has served as a successful strategy for achieving a better trade-off between performance and efficiency of deep neural networks by using the knowledge of a more complex network (the teacher) to assist the training of a lighter network (the student). Methods based on knowledge distillation have greatly improved the accuracy of lightweight networks, performing tasks; such as image classification [13]- [17], object detection [18]- [20], and face recognition [21]- [23]. The knowledge distilled in the pioneering work of [12] for the task of image classification provided soft labels from a heavy teacher network with more beneficial information (e.g., intra-class similarity and inter-class difference), than the hard labels originally provided to the small network in the form of one-hot class label vectors.…”
Section: Introductionmentioning
confidence: 99%