2021
DOI: 10.1109/tpds.2020.3047003
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 28 publications
(16 citation statements)
references
References 18 publications
0
16
0
Order By: Relevance
“…Contrary to the earlier belief that teachers must be larger than students, recent studies have revealed that smaller teachers can be used to train larger students [10]. With such findings, blockwise distillation is used in various fields such as model compression [7,11] and NAS [9,12]. Since training a small teacher for a new task is quick and easy, blockwise distillation can be applied in most cases where traditional training is used.…”
Section: Introductionmentioning
confidence: 98%
See 1 more Smart Citation
“…Contrary to the earlier belief that teachers must be larger than students, recent studies have revealed that smaller teachers can be used to train larger students [10]. With such findings, blockwise distillation is used in various fields such as model compression [7,11] and NAS [9,12]. Since training a small teacher for a new task is quick and easy, blockwise distillation can be applied in most cases where traditional training is used.…”
Section: Introductionmentioning
confidence: 98%
“…Blockwise distillation [7,8,9] is one promising approach to mitigate such problems. As illustrated in Fig.…”
Section: Introductionmentioning
confidence: 99%
“…In summary, the significant contributions of this study are as follows: Instead of designing complex GCN structures, we introduce the concept of knowledge distillation into GCN‐based recommendations and propose a TKDM to improve the recommendation effectiveness. Specifically, we design new knowledge distillation methods for GCN‐based recommendation systems by noting that existing ones 30–32 are proposed for classification tasks on traditional neural network components, for example, MLP and convolutional neural networks. We present and apply a SDM on an FRL‐net that effectively learns user and item feature representations. We then introduce a MDM and apply it on a UPL‐net to learn user preferences on items accurately. We comprehensively investigate the effectiveness of the proposed TKDM through extensive experiments on three real‐world data sets.…”
Section: Introductionmentioning
confidence: 99%
“…(1) Instead of designing complex GCN structures, we introduce the concept of knowledge distillation into GCN-based recommendations and propose a TKDM to improve the recommendation effectiveness. Specifically, we design new knowledge distillation methods for GCN-based recommendation systems by noting that existing ones [30][31][32] are proposed for classification tasks on traditional neural network components, for example, MLP and convolutional neural networks. (2) We present and apply a SDM on an FRL-net that effectively learns user and item feature representations.…”
Section: Introductionmentioning
confidence: 99%
“…DNN compression methods reduce the model's memory size and computational requirements without significantly impacting its accuracy. DNN compression techniques are classified into five main categories [6]: pruning [7], quantization [8], compact convolutional filters [9], knowledge distillation [10], and low-rank factorization [11].…”
Section: Introductionmentioning
confidence: 99%