Discrimination-aware Network Pruning for Deep Model Compression

Jing, Liu; Zhuang, Bohan; Zhuang, Zhuangwei; Guo, Yong; Huang, Junzhou; Zhu, Jun; Tan, Mingkui

doi:10.1109/tpami.2021.3066410

Cited by 69 publications

(50 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Aside from quantization, there are also several other popular techniques to compress PLMs. Pruning removes unimportant parameters or connections in a well-trained model [36], [37], [38], [39], [40], and is widely explored in PLMs [41], [42]. A direct way is to remove connections with small magnitudes during the pre-training and adds them back when necessary in downstream tasks [41].…”

Section: Network Compression For Pre-trained Language Modelsmentioning

confidence: 99%

Towards Efficient Post-training Quantization of Pre-trained Language Models

Bai,

Hou,

Shang

et al. 2021

Preprint

View full text Add to dashboard Cite

Network quantization has gained increasing attention with the rapid growth of large pre-trained language models (PLMs). However, most existing quantization methods for PLMs follow quantization-aware training (QAT) that requires end-to-end training with full access to the entire dataset. Therefore, they suffer from slow training, large memory overhead, and data security issues. In this paper, we study post-training quantization (PTQ) of PLMs, and propose module-wise quantization error minimization (MREM), an efficient solution to mitigate these issues. By partitioning the PLM into multiple modules, we minimize the reconstruction error incurred by quantization for each module. In addition, we design a new model parallel training strategy such that each module can be trained locally on separate computing devices without waiting for preceding modules, which brings nearly the theoretical training speed-up (e.g., 4× on 4 GPUs). Experiments on GLUE and SQuAD benchmarks show that our proposed PTQ solution not only performs close to QAT, but also enjoys significant reductions in training time, memory overhead, and data consumption.

show abstract

Section: Network Compression For Pre-trained Language Modelsmentioning

confidence: 99%

Towards Efficient Post-training Quantization of Pre-trained Language Models

Bai,

Hou,

Shang

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In the case of ThiNet [32], x is a point sampled from the input feature map and reconstruction(x) is its reconstruction after pruning. Metrics used by ThiNet [32], He et al [33], and Liu et al [34] choose channels based on the least error incurred to output feature maps. Hence, the layer-wise feature maps error is used as a saliency metric.…”

Section: Weight and Input Images Based Saliency Metricsmentioning

confidence: 99%

“…The main difference between the approach explored by ThiNet [32] and He et al [33] is how they estimate the damaged feature map. Liu et al [34] also introduce a layer-wise loss alongside the reconstruction error.…”

Section: Weight and Input Images Based Saliency Metricsmentioning

confidence: 99%

“…Taylor expansion-based metrics often approximate the change in global [9], [10], [16], [23] or layer-wise [31] loss caused by pruning and remove weights that cause the least change in loss. ThiNet [32], He et al [33], and Lin et al [34] choose weights that lead to the least feature map reconstruction error. Hur and Kang [35] use the entropy of the weights to determine which weights are least important.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Taxonomy of Saliency Metrics for Channel Pruning

Persand¹,

Anderson

Gregg

2021

IEEE Access

View full text Add to dashboard Cite

Pruning unimportant parameters can allow deep neural networks (DNNs) to reduce their heavy computation and memory requirements. A saliency metric estimates which parameters can be safely pruned with little impact on the classification performance of the DNN. Many saliency metrics have been proposed, each within the context of a wider pruning algorithm. The result is that it is difficult to separate the effectiveness of the saliency metric from the wider pruning algorithm that surrounds it. Similar-looking saliency metrics can yield very different results because of apparently minor design choices. We propose a novel taxonomy of saliency metrics based on four mostly-orthogonal principal components. We show that a broad range of metrics from the pruning literature can be grouped according to these components. Our taxonomy serves as a guide to prior work, and allows us to construct new saliency metrics by exploring novel combinations of our taxonomic components. We perform the first in-depth experimental investigation of more than 300 saliency metrics made up of existing techniques and new combinations of components. Our results provide decisive answers to open research questions. In particular, we demonstrate the importance of reduction and scaling when pruning groups of weights. We also propose a novel scaling method based on the number of weights transitively removed. We find that some of our constructed metrics can outperform the best existing state-of-the-art metrics for convolutional neural network channel pruning. We find further that our novel scaling method improves existing saliency metrics.INDEX TERMS machine learning, convolution neural networks, pruning, saliency metric, model compression VOLUME x, 20xx

show abstract

“…However, a typical deep model is difficult to be deployed on such devices. Due to its limited resource, the model is required to be compressed according to its storage space [9,10], which might reduce the performance. Meanwhile, the inference speed is slow, since its computational capacity is much worse than the external devices having dedicated graphics cards.…”

Section: Introductionmentioning

confidence: 99%

Visibility estimation via deep label distribution learning in cloud environment

Song

Liu

et al. 2021

J Cloud Comp

View full text Add to dashboard Cite

The visibility estimation of the environment has great research and application value in the fields of production. To estimate the visibility, we can utilize the camera to obtain some images as evidence. However, the camera only solves the image acquisition problem, and the analysis of image visibility requires strong computational power. To realize effective and efficient visibility estimation, we employ the cloud computing technique to realize high-through image analysis. Our method combines cloud computing and image-based visibility estimation into a powerful and efficient monitoring framework. To train an accurate model for visibility estimation, it is important to obtain the precise ground truth for every image. However, the ground-truth visibility is difficult to be labeled due to its high ambiguity. To solve this problem, we associate a label distribution to each image. The label distribution contains all the possible visibilities with their probabilities. To learn from such annotation, we employ a CNN-RNN model for visibility-aware feature extraction and a conditional probability neural network for distribution prediction. The estimation result can be improved by fusing the predicting results of multiple images from different views. Our experiment shows that labeling the image with visibility distribution can boost the learning performance, and our method can obtain the visibility from the image efficiently.

show abstract

Discrimination-aware Network Pruning for Deep Model Compression

Cited by 69 publications

References 54 publications

Towards Efficient Post-training Quantization of Pre-trained Language Models

Towards Efficient Post-training Quantization of Pre-trained Language Models

Taxonomy of Saliency Metrics for Channel Pruning

Visibility estimation via deep label distribution learning in cloud environment

Contact Info

Product

Resources

About