2019
DOI: 10.1109/access.2019.2957203
|View full text |Cite
|
Sign up to set email alerts
|

Pruning Blocks for CNN Compression and Acceleration via Online Ensemble Distillation

Abstract: In this paper, we propose an online ensemble distillation (OED) method to automatically prune blocks/layers of a target network by transferring the knowledge from a strong teacher in an endto-end manner. To accomplish this, we first introduce a soft mask to scale the output of each block in the target network and enforce the sparsity of the mask by sparsity regularization. Then, a strong teacher network is constructed online by replicating the same target networks and ensembling the discriminative features fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 49 publications
0
14
0
Order By: Relevance
“…In ResNet-56, the CR and AR of the PKP method are 2.51× and 2.54×, respectively. The classification accuracy is 93.51%, which is a decreases of 0.09% from that of the baseline and is the highest compared to those of the methods in [11], [13], [14], [18], [21]- [23], [41]. The CR of the method in [41] reaches 2.93×, but the classification accuracy is 91.58%, which is a decrease of 2.39%.…”
Section: Experiments and Analysismentioning
confidence: 88%
See 3 more Smart Citations
“…In ResNet-56, the CR and AR of the PKP method are 2.51× and 2.54×, respectively. The classification accuracy is 93.51%, which is a decreases of 0.09% from that of the baseline and is the highest compared to those of the methods in [11], [13], [14], [18], [21]- [23], [41]. The CR of the method in [41] reaches 2.93×, but the classification accuracy is 91.58%, which is a decrease of 2.39%.…”
Section: Experiments and Analysismentioning
confidence: 88%
“…The CR of the method in [41] reaches 2.93×, but the classification accuracy is 91.58%, which is a decrease of 2.39%. The AR of the method in [23] reaches 3.09×, and the classification accuracy is 92.29%, which is a decreases of 1.68%. For ResNet-110, the proposed PKP has the highest CR and AR, reaching 2.32× and 2.18×, respectively, and the classification accuracy decreases by 0.45% from that of the baseline, reaching 93.82%; this is also the highest when compared to the classification accuracies of the methods in [13], [14], [18], [23], [41].…”
Section: Experiments and Analysismentioning
confidence: 91%
See 2 more Smart Citations
“…Date-Free learning(DFL) [34] combines generative adversarial networks and knowledge distillation to construct a datafree knowledge distillation structure, which can train highperformance student networks without data. Wang et al [47] proposed a one-shot automatic pruning method based on the online ensemble distillation, which pruning removes the redundant structures of CNNs at once in a global way to obtain compact ones without any iterative pruning and retraining.…”
Section: B Knowledge Distillationmentioning
confidence: 99%