2020
DOI: 10.48550/arxiv.2006.05525
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Knowledge Distillation: A Survey

Jianping Gou,
Baosheng Yu,
Stephen John Maybank
et al.

Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
47
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(48 citation statements)
references
References 179 publications
(439 reference statements)
1
47
0
Order By: Relevance
“…Knowledge distillation [42] originally is designed for training a smaller model which can be deployed on edge devices. Nowadays, it has been a stepping stone for numerous algorithms [43]. The core concept of knowledge distillation is to provide a meaningful label representation from a pre-trained model, called the teacher model.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…Knowledge distillation [42] originally is designed for training a smaller model which can be deployed on edge devices. Nowadays, it has been a stepping stone for numerous algorithms [43]. The core concept of knowledge distillation is to provide a meaningful label representation from a pre-trained model, called the teacher model.…”
Section: Knowledge Distillationmentioning
confidence: 99%
“…We point the interested reader to Gou et al (2020) for a sweeping survey of the many developments in knowledge distillation over the past half decade. In addition to the references discussing theoretical aspects of knowledge distillation provided in Sec.…”
Section: A Extended Literature Reviewmentioning
confidence: 99%
“…Related work. Since we cannot review the vast literature on KD in its entirety, we point the interested reader to Gou et al (2020) for a recent overview of the field. We devote this section to reviewing theoretical advances in the understanding of KD and summarize complementary empirical studies and applications of in the extended literature review in App.…”
Section: Introductionmentioning
confidence: 99%
“…A well-trained model captures meaningful knowledge or information for a specific task. The knowledge distillation approach aims to distill the learning capacity of a larger deep neural network (teacher model) to a smaller network (student model) [28,29]. It has shown efficacy in cross-modal scenarios, where the teacher model is trained on one modality and the knowledge is transferred to another modality [14,30].…”
Section: Introductionmentioning
confidence: 99%