2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00511
|View full text |Cite
|
Sign up to set email alerts
|

Correlation Congruence for Knowledge Distillation

Abstract: Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge transfer. In this work, we propose a new framework named correlation congruence for knowledge distillation (CCKD), which transfers not only the instance-level information, but also the correlation between instances. Furthermore, a generalized kernel method based on Taylor seri… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
246
0
2

Year Published

2020
2020
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 471 publications
(249 citation statements)
references
References 31 publications
1
246
0
2
Order By: Relevance
“…We selected 8 SOTA methods as baselines for the conventional full-category knowledge distillation task, and compare 8 of them for the partial-category classification task. The involved methods include: Knowledge Distillation (KD) [2], FitNet [9], Attention Transfer (AT) [10], Similarity-Preserving Knowledge Distillation (SP) [34], Correlation Congruence (CC) [15], Variational information distillation (VID) [35], Relational Knowledge Distillation (RKD) [11], and Contrastive Representational Distillation (CRD) [16]. Table I shows the Top-1 accuracy of the fog networks obtained as student using all the baselines mentioned above, classifying a randomly sampled subset of CIFAR100 classes (average on 20,50,70 classes).…”
Section: B Comparison 1: Hbs Cloud-to-fog Vs Modern Supervised Knowmentioning
confidence: 99%
“…We selected 8 SOTA methods as baselines for the conventional full-category knowledge distillation task, and compare 8 of them for the partial-category classification task. The involved methods include: Knowledge Distillation (KD) [2], FitNet [9], Attention Transfer (AT) [10], Similarity-Preserving Knowledge Distillation (SP) [34], Correlation Congruence (CC) [15], Variational information distillation (VID) [35], Relational Knowledge Distillation (RKD) [11], and Contrastive Representational Distillation (CRD) [16]. Table I shows the Top-1 accuracy of the fog networks obtained as student using all the baselines mentioned above, classifying a randomly sampled subset of CIFAR100 classes (average on 20,50,70 classes).…”
Section: B Comparison 1: Hbs Cloud-to-fog Vs Modern Supervised Knowmentioning
confidence: 99%
“…Many methods have been proposed to minimize the performance gap between a student and a teacher. We discuss different forms of knowledge in the following categories: response-based knowledge [26,27,35], feature-based knowledge [28,29,[36][37][38][39][40][41][42][43][44][45], and relation-based knowledge [30,31,[46][47][48][49].…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…The work in [48] proposed a Similarity Preserving (SP) distillation method to transfer pairwise activation similarities of input samples. Peng et al proposed a method based on CC [31], in which the distilled knowledge contains information of instances and the correlations between two instances. Liu et al proposed Instance Relationship Graph (IRG) [49] which contains instance features and relationships, and the feature space transformation cross layers as the knowledge to transfer.…”
Section: Relation-based Knowledgementioning
confidence: 99%
See 1 more Smart Citation
“…By combining the ideas of the generative adversarial network (GAN) [47] and knowledge distillation, the student network is able to effectively learn the performance of the teacher network [48]. In [49], the authors proposed a knowledge distillation method based on the correlation between instances. Unlike the previous knowledge distillation method, the correlation congruence knowledge distillation (CCKD) method transfers not only the instance-level information, but also the correlation between instances.…”
Section: Knowledge Distillationmentioning
confidence: 99%