2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00145
|View full text |Cite
|
Sign up to set email alerts
|

Similarity-Preserving Knowledge Distillation

Abstract: Knowledge distillation is a widely applicable technique for training a student neural network under the guidance of a trained teacher network. For example, in neural network compression, a high-capacity teacher is distilled to train a compact student; in privileged learning, a teacher trained with privileged data is distilled to train a student without access to that data. The distillation loss determines how a teacher's knowledge is captured and transferred to the student. In this paper, we propose a new form… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
492
2
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 818 publications
(496 citation statements)
references
References 33 publications
0
492
2
2
Order By: Relevance
“…There is active research on methods that can produce more compact networks with improved prediction capability. Common approaches include knowledge distillation [206], where a compact student network is trained to mimic a larger network, e.g., by guiding the network to produce similar activations for similar inputs, and advanced network models, such as Operational Neural Networks [207], where the linear operators of CNNs are replaced by various (non-)linear operations, which allows to produce complex outputs which much fewer parameters. [208], [209].…”
Section: Fast and Computationally Light Methodsmentioning
confidence: 99%
“…There is active research on methods that can produce more compact networks with improved prediction capability. Common approaches include knowledge distillation [206], where a compact student network is trained to mimic a larger network, e.g., by guiding the network to produce similar activations for similar inputs, and advanced network models, such as Operational Neural Networks [207], where the linear operators of CNNs are replaced by various (non-)linear operations, which allows to produce complex outputs which much fewer parameters. [208], [209].…”
Section: Fast and Computationally Light Methodsmentioning
confidence: 99%
“…We selected 8 SOTA methods as baselines for the conventional full-category knowledge distillation task, and compare 8 of them for the partial-category classification task. The involved methods include: Knowledge Distillation (KD) [2], FitNet [9], Attention Transfer (AT) [10], Similarity-Preserving Knowledge Distillation (SP) [34], Correlation Congruence (CC) [15], Variational information distillation (VID) [35], Relational Knowledge Distillation (RKD) [11], and Contrastive Representational Distillation (CRD) [16]. Table I shows the Top-1 accuracy of the fog networks obtained as student using all the baselines mentioned above, classifying a randomly sampled subset of CIFAR100 classes (average on 20,50,70 classes).…”
Section: B Comparison 1: Hbs Cloud-to-fog Vs Modern Supervised Knowmentioning
confidence: 99%
“…Relation-based knowledge does not focus on the value of certain layers but explores the relationship between different sample data or network feature layers. In our paper, similarity-preserving knowledge distillation method [30] is used, as is shown in Fig. 6.…”
Section: Response-based Knowledgementioning
confidence: 99%