2021
DOI: 10.7717/peerj-cs.474
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge distillation in deep learning and its applications

Abstract: Deep learning based models are relatively large, and it is hard to deploy such models on resource-limited devices such as mobile phones and embedded devices. One possible solution is knowledge distillation whereby a smaller model (student model) is trained by utilizing the information from a larger model (teacher model). In this paper, we present an outlook of knowledge distillation techniques applied to deep learning models. To compare the performances of different techniques, we propose a new metric called d… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
26
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 48 publications
(29 citation statements)
references
References 39 publications
1
26
0
Order By: Relevance
“…The results indicate that a larger kernel of AlexNet (7×7) is more efficient on this task. In addition, to further optimize the detection system, we can deploy deep learning models on smartphones and get the result more conveniently and efficiently (Alkhulaifi et al 2021; Sujit et al 2021).…”
Section: Discussionmentioning
confidence: 99%
“…The results indicate that a larger kernel of AlexNet (7×7) is more efficient on this task. In addition, to further optimize the detection system, we can deploy deep learning models on smartphones and get the result more conveniently and efficiently (Alkhulaifi et al 2021; Sujit et al 2021).…”
Section: Discussionmentioning
confidence: 99%
“…Teacher-Student architectures have been commonly applied in knowledge distillation for model compression, and some surveys [5], [6], [7] summarized the recent progress of various knowledge distillation techniques with Teacher-Student architectures. Specifically, Gou et al [5] presented a comprehensive survey on knowledge distillation from the following perspectives: knowledge types, distillation schemes, and Teacher-Student architectures.…”
Section: Introductionmentioning
confidence: 99%
“…Wang et al [6] provided a systematic overview and insight into knowledge distillation with Teacher-Student architectures in CV applications. Alkhulaifi et al [7] summarized multiple distillation metrics to compare the performances of different distillation methods. However, these aforementioned surveys do not discuss knowledge construction and optimization during the distillation process, where the knowledge types and optimization objectives are the important factors in providing informative knowledge for student learning.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Knowledge elements that are transferred to the student models can be output values of certain layers in the teacher network, for example, it may be logits that precede softmax in classification. It is also possible to use internal layer output values of the teacher network [2]. This method shows good results for training more compact networks while maintaining the required accuracy, but there is no standard approach for organizing such a process.…”
Section: Introductionmentioning
confidence: 99%