Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Mao, Huizi; Han, Song; Pool, Jeff; Li, Wenshuo; Liu, Xingyu; Wang, Yu; Dally, William J.

doi:10.48550/arxiv.1705.08922

Cited by 82 publications

(66 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This metric is directly related to the energy efficiency of (FPGA or ASIC) hardware implementation. As can be observed in the table, the proposed ADMM framework achieves significant amount of computation reduction compared with prior work, even when some [36,53] also focus on computation reductions. For the first metric of computation reduction, the improvement can be close to 3× compared with prior work for CONV layers, and this improvement reaches 3.6× for the second metric.…”

Section: Computation Reduction Comparisonsmentioning

confidence: 85%

See 1 more Smart Citation

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

Ren,

Zhang,

et al. 2018

Preprint

View full text Add to dashboard Cite

To facilitate efficient embedded and hardware implementations of deep neural networks (DNNs), a number of prior work are dedicated to model compression techniques. The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, there lacks a systematic framework of joint weight pruning and quantization of DNNs, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted for besides simply model size reduction.To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to deal with non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than prior work. The second part is hardware-aware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization accounting for (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation.Without accuracy loss, we can achieve 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, significantly higher than prior work. The improvement becomes more significant when focusing on computation reductions. Combining weight pruning and quantization, we achieve 1,910× and 231× reductions in overall model size on these two benchmarks, when focusing on data storage. Highly promising results are also observed on other representative to appear in ASPLOS 2019 DNNs such as VGGNet and ResNet-50. We release codes and models at anonymous link http://bit.ly/2M0V7DO.

show abstract

Section: Computation Reduction Comparisonsmentioning

confidence: 85%

“…Next we compare on the synthesized hardware speedup results between the proposed hardware-aware DNN model compression algorithm with baselines. The baselines include the iterative weight pruning and weight clustering work [22,24], and recent work [36,53]…”

Section: Results and Discussion On Computation Reduction And Hardware...mentioning

confidence: 99%

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

Ren,

Zhang,

et al. 2018

Preprint

View full text Add to dashboard Cite

show abstract

“…surg. [7] 80.0% +0.2% 3.1× NeST [5] 80.3% −0.1% 3.2× Fine-grained [30] Table 2 compares the pruning rates on the CONV layers vs. Top-5 accuracy, since the CONV layers are the most computationally intensive in state-of-art DNNs. We achieve 8.6× pruning in CONV layers with even slight accuracy enhancement, and 11.2× pruning with minor accuracy loss, consistently outperforming prior work in CONV layer weight pruning.…”

Section: Experimental Results On Weight Pruningmentioning

confidence: 99%

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM

Ye¹,

Feng²,

Zhang³

et al. 2019

Preprint

View full text Add to dashboard Cite

Weight pruning and weight quantization are two important categories of DNN model compression. Prior work on these techniques are mainly based on heuristics. A recent work developed a systematic framework of DNN weight pruning using the advanced optimization technique ADMM (Alternating Direction Methods of Multipliers), achieving one of state-of-art in weight pruning results. In this work, we first extend such one-shot ADMM-based framework to guarantee solution feasibility and provide fast convergence rate, and generalize to weight quantization as well. We have further developed a multi-step, progressive DNN weight pruning and quantization framework, with dual benefits of (i) achieving further weight pruning/quantization thanks to the special property of ADMM regularization, and (ii) reducing the search space within each step. Extensive experimental results demonstrate the superior performance compared with prior work. Some highlights: (i) we achieve 246×, 36×, and 8× weight pruning on LeNet-5, AlexNet, and ResNet-50 models, respectively, with (almost) zero accuracy loss; (ii) even a significant 61× weight pruning in AlexNet (ImageNet) results in only minor degradation in actual accuracy compared with prior work; (iii) we are among the first to derive notable weight pruning results for ResNet and Mo-bileNet models; (iv) we derive the first lossless, fully binarized (for all layers) LeNet-5 for MNIST and VGG-16 for CIFAR-10; and (v) we derive the first fully binarized (for all layers) ResNet for ImageNet with reasonable accuracy loss. Our models and sample codes are released in link https://bit.ly/2TYx7Za.

show abstract

“…Sparsity in Deep Neural Nets has also been extensively explored, and the work in this area can be categorized as unstructured or structured types. In particular, unstructured pruning approaches (Han et al, 2015a,b;Dai et al, 2019;Mao et al, 2017;Narang et al, 2017a) result in random sparsity in weight matrices, which is difficult to accelerate on general-purpose hardware due to storage and memory access overheads. This has motivated structured sparsity-based approaches for both CNNs Wen et al, 2016;Narang et al, 2017a) and RNNs (Lu et al, 2016;Wen et al, 2017;Narang et al, 2017b).…”

Section: Related Workmentioning

confidence: 99%

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

Sarma,

Singh,

Jiang

et al. 2021

Preprint

View full text Add to dashboard Cite

Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each iteration of training. We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. We provide a detailed analysis of how the dropoutinduced sparsity propagates through the different stages of network training and how it can be leveraged in each stage. More importantly, our proposed approach works as a direct replacement for existing dropout-based application settings. We conduct our experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling using the CoNLL-2003 shared task. We demonstrate that our proposed approach can be used to translate dropout-based computation reduction into reduced training time, with improvement ranging from 1.23× to 1.64×, without sacrificing the target metric.Preprint. Under review.

show abstract

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

Cited by 82 publications

References 11 publications

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

Contact Info

Product

Resources

About