2020
DOI: 10.1007/978-3-030-58595-2_40
|View full text |Cite
|
Sign up to set email alerts
|

Feature Normalized Knowledge Distillation for Image Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
26
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 53 publications
(26 citation statements)
references
References 22 publications
0
26
0
Order By: Relevance
“…To deploy deep HAR model on mobile wearable devices, model compression with small accuracy diminution is a critical and challenging mission. Universally, model compression methods include pruning [32], [33], [34], [35], [36], quantization [37], [38], low-rank approximation & sparsity [39], [40], and knowledge distillation [27], [41], [42]. Model pruning aims to prune non-significant weights in large models, and those pruned large-sparse models also have significant performance [43].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…To deploy deep HAR model on mobile wearable devices, model compression with small accuracy diminution is a critical and challenging mission. Universally, model compression methods include pruning [32], [33], [34], [35], [36], quantization [37], [38], low-rank approximation & sparsity [39], [40], and knowledge distillation [27], [41], [42]. Model pruning aims to prune non-significant weights in large models, and those pruned large-sparse models also have significant performance [43].…”
Section: Related Workmentioning
confidence: 99%
“…Knowledge distilling [27], [41], [42] plays an vital role in developing deep learning models that are friendly to wearable embedded devices. According to the distilled elements in the deep models [25], we can divide KD into response-based KD [27], [42], feature-based KD [41], [41], [49], and relation-based KD [50], [51]. Hinton et al [27] have formally popularized the idea of knowledge distillation in 2015.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Generally, most of KD methods distill the response-based knowledge (e.g., soft logits of the output layer) from a large teacher network and transfer it to a small student [20,32,22]. To overcome the limitation of knowledge from the output layer of teacher, the featurebased knowledge from the middle layers of teacher is also used to train the student [21,18,19]. Unlike both the response-based and the feature-based knowledge from individual instances, the relation-based knowledge from instance relations is modelled for improving student learning [37,38,15,16,17].…”
Section: Introductionmentioning
confidence: 99%