2015
DOI: 10.48550/arxiv.1510.00149
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Abstract: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35× to 49× without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

8
1,982
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 1,252 publications
(1,992 citation statements)
references
References 15 publications
8
1,982
0
2
Order By: Relevance
“…But on one hand, existing model compression methods focused only on creating compressed models for efficient inference without considering how to compression methods affect the training process (Han et al, 2015;Chen et al, 2015;Kadetotad et al, 2016;Li et al, 2016;Polino et al, 2018), and how to reduce the accuracy loss caused by compression. On the other hand, existing knowledge transfer methods have the following limitations: 1) they still require large student models that are not fit for resources constrained devices (Romero et al, 2014;Li et al, 2019;Yim et al, 2017); 2) they only enable to student model to classify the categories that the models are trained with.…”
Section: Background and Motivationsmentioning
confidence: 99%
See 4 more Smart Citations
“…But on one hand, existing model compression methods focused only on creating compressed models for efficient inference without considering how to compression methods affect the training process (Han et al, 2015;Chen et al, 2015;Kadetotad et al, 2016;Li et al, 2016;Polino et al, 2018), and how to reduce the accuracy loss caused by compression. On the other hand, existing knowledge transfer methods have the following limitations: 1) they still require large student models that are not fit for resources constrained devices (Romero et al, 2014;Li et al, 2019;Yim et al, 2017); 2) they only enable to student model to classify the categories that the models are trained with.…”
Section: Background and Motivationsmentioning
confidence: 99%
“…Without loss of generality, we consider image classification tasks and use ResNet, as an example to discuss our proposed on-device learning solution. Image classification is important for many edge applications, and is also the target task of the related model compression and knowledge distillation works (Hinton et al, 2015;Han et al, 2015;Chen et al, 2015;Polino et al, 2018;Srinivas & Babu, 2015). ResNet is a modern architecture with streamlined convolutional layers.…”
Section: Filter Pruning Based Model Compressionmentioning
confidence: 99%
See 3 more Smart Citations