Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Han, Song; Mao, Huizi; Dally, William J.

doi:10.48550/arxiv.1510.00149

Cited by 1,252 publications

(1,992 citation statements)

References 15 publications

Supporting

Mentioning

1,982

Contrasting

Unclassified

Order By: Relevance

“…But on one hand, existing model compression methods focused only on creating compressed models for efficient inference without considering how to compression methods affect the training process (Han et al, 2015;Chen et al, 2015;Kadetotad et al, 2016;Li et al, 2016;Polino et al, 2018), and how to reduce the accuracy loss caused by compression. On the other hand, existing knowledge transfer methods have the following limitations: 1) they still require large student models that are not fit for resources constrained devices (Romero et al, 2014;Li et al, 2019;Yim et al, 2017); 2) they only enable to student model to classify the categories that the models are trained with.…”

Section: Background and Motivationsmentioning

confidence: 99%

“…Without loss of generality, we consider image classification tasks and use ResNet, as an example to discuss our proposed on-device learning solution. Image classification is important for many edge applications, and is also the target task of the related model compression and knowledge distillation works (Hinton et al, 2015;Han et al, 2015;Chen et al, 2015;Polino et al, 2018;Srinivas & Babu, 2015). ResNet is a modern architecture with streamlined convolutional layers.…”

Section: Filter Pruning Based Model Compressionmentioning

confidence: 99%

“…Model compression techniques can be broadly classified into three categories, weight sharing, quantization, and pruning techniques. Weight sharing reduces the occu-pied memory by using the same set of weights to represent more than one transformations (Han et al, 2015;Chen et al, 2015). Quantization reduces the size of the model by shrinking the number of bits needed for storing the weights (Han et al, 2015;Kadetotad et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

“…Weight sharing reduces the occu-pied memory by using the same set of weights to represent more than one transformations (Han et al, 2015;Chen et al, 2015). Quantization reduces the size of the model by shrinking the number of bits needed for storing the weights (Han et al, 2015;Kadetotad et al, 2016). Pruning removes redundant weights or neurons while minimizing accuracy loss.…”

Section: Related Workmentioning

confidence: 99%

“…To deploy DNNs on resource-constrained devices, there are two general approaches. The first approach aims to compress already-trained models, using techniques such as weights sharing (Chen et al, 2015), quantization (Han et al, 2015;Kadetotad et al, 2016), and pruning (Han et al, 2015;LeCun et al, 1990;Srinivas & Babu, 2015). However, a compressed model generated by these approaches is useful only for inference; it cannot be retrained to capture user-or device-specific requirements or new data available at runtime.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Enabling Deep Learning on Edge Devices through Filter Pruning and Knowledge Transfer

Zhao¹,

Chen²,

Zhao³

2022

Preprint

View full text Add to dashboard Cite

Deep learning models have introduced various intelligent applications to edge devices, such as image classification, speech recognition, and augmented reality. There is an increasing need of training such models on the devices in order to deliver personalized, responsive, and private learning. To address this need, this paper presents a new solution for deploying and training state-of-the-art models on the resourceconstrained devices. First, the paper proposes a novel filter-pruning-based model compression method to create lightweight trainable models from large models trained in the cloud, without much loss of accuracy. Second, it proposes a novel knowledge transfer method to enable the on-device model to update incrementally in real time or near real time using incremental learning on new data and enable the on-device model to learn the unseen categories with the help of the in-cloud model in an unsupervised fashion. The results show that 1) our model compression method can remove up to 99.36% parameters of WRN-28-10, while preserving a Top-1 accuracy of over 90% on CIFAR-10; 2) our knowledge transfer method enables the compressed models to achieve more than 90% accuracy on CIFAR-10 and retain good accuracy on old categories; 3) it allows the compressed models to converge within real time (three to six minutes) on the edge for incremental learning tasks; 4) it enables the model to classify unseen categories of data (78.92% Top-1 accuracy) that it is never trained with.

show abstract

Section: Background and Motivationsmentioning

confidence: 99%

Section: Filter Pruning Based Model Compressionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Enabling Deep Learning on Edge Devices through Filter Pruning and Knowledge Transfer

Zhao¹,

Chen²,

Zhao³

2022

Preprint

View full text Add to dashboard Cite

show abstract

Deep Learning in the Era of Edge Computing: Challenges and Opportunities

Zhang

Zhang²,

Lane

et al. 2020

Fog Computing

View full text Add to dashboard Cite

The era of edge computing has arrived. Although the Internet is the backbone of edge computing, its true value lies at the intersection of gathering data from sensors and extracting meaningful information from the sensor data. We envision that in the near future, majority of edge devices will be equipped with machine intelligence powered by deep learning. However, deep learning-based approaches require a large volume of high-quality data to train and are very expensive in terms of computation, memory, and power consumption. In this chapter, we describe eight research challenges and promising opportunities at the intersection of computer systems, networking, and machine learning. Solving those challenges will enable resource-limited edge devices to leverage the amazing capability of deep learning. We hope this chapter could inspire new research that will eventually lead to the realization of the vision of intelligent edge.

show abstract