Efficient Hardware Realization of Convolutional Neural Networks Using Intra-Kernel Regular Pruning

Yang, Maurice; Faraj, Mahmoud; Hussein, Abdalla; Gaudet, Vincent

doi:10.1109/ismvl.2018.00039

Cited by 15 publications

(9 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The majority of works in this direction apply a pretraining-pruning-retraining flow, which is not compatible with the trainingon-the-edge paradigm. According to the adopted sparsity scheme, those works can be categorized as unstructured [16,1], structured [24,2,25,26,17,3,27,28,29,30,31,18,32,33], and fine-grained structured [19,34,35,36,37,38,39,40,41] including the pattern-based and block-based ones. Detailed discussion about these sparsity schemes is provided in Appendix A.…”

Section: Sparsity Schemementioning

confidence: 99%

“…Furthermore, as with inference acceleration, we find that sparse training closely relates to the adopted sparsity scheme such as unstructured [16], structured [17,18], or fine-grained structured [19] scheme, which can result in varying accuracy, training speed, and memory footprint performance for sparse training. With our effective MEST framework, this paper systematically investigates the sparse training problem with respect to the sparsity schemes.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Yuan¹,

Ma²,

Niu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure superior accuracy at high sparsity ratios. Different from the existing works for sparse training, this current work reveals the importance of sparsity schemes on the performance of sparse training in terms of accuracy as well as training speed on real edge devices. On top of that, the paper proposes to employ data efficiency for further acceleration of sparse training. Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks in the sparse training process, and therefore can be removed for further training speedup on edge devices. Comparing with state-of-the-art (SOTA) works on accuracy, our MEST increases Top-1 accuracy significantly on ImageNet when using the same unstructured sparsity scheme. Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works. Our codes are publicly available at: https://github.com/boone891214/MEST.Recently, a new trend of exploring sparsity for training acceleration of neural networks has emerged to embrace the promising training-on-the-edge paradigm. The first works in this direction use the pruning-at-initialization approach such as SNIP [9] and GraSP [10] that first obtains a fixed sparse model structure and then follows with a traditional training process. However, the whole process is still computation-and memory-intensive, and therefore not compatible with the end-to-end edge training paradigm. Such a sparse training methodology with the pre-fixed structure also faces the problem of compromised accuracy.

show abstract

Section: Sparsity Schemementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Yuan¹,

Ma²,

Niu³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Interestingly, iterative pruning was used to add multiple tasks to a single network by Mallya and Lazebnik [29]. IKP pruning scheme was advanced by Yang et al [30] for removing redundant weights at a fine-grained level and showed good performance in hardware accelerator. To prune the deep models for object detection, Ghosh et al [31] analyzed the pruning approach about the detection networks and utilized the pruning technique based on agglomerative clustering for the feature extractor and mutual information for the detector.…”

Section: Related Workmentioning

confidence: 99%

Small Network for Lightweight Task in Computer Vision: A Pruning Method Based on Feature Representation

Gao

2021

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Many current convolutional neural networks are hard to meet the practical application requirement because of the enormous network parameters. For accelerating the inference speed of networks, more and more attention has been paid to network compression. Network pruning is one of the most efficient and simplest ways to compress and speed up the networks. In this paper, a pruning algorithm for the lightweight task is proposed, and a pruning strategy based on feature representation is investigated. Different from other pruning approaches, the proposed strategy is guided by the practical task and eliminates the irrelevant filters in the network. After pruning, the network is compacted to a smaller size and is easy to recover accuracy with fine-tuning. The performance of the proposed pruning algorithm is validated on the acknowledged image datasets, and the experimental results prove that the proposed algorithm is more suitable to prune the irrelevant filters for the fine-tuning dataset.

show abstract

“…Complementary to those mobile inference acceleration approaches, DNN model compression techniques provide another possibility to efficient on-device inference. Two main-stream model compression techniques are weight pruning [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36] and weight quantization [37], [38], [39], [40], [41]. Weight pruning enjoys the great flexibility of various DNN weight sparsity schemes and has achieved very high pruning rate and accuracy.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, the pattern-based weight pruning techniques [34], [35] provide a novel weight sparsity scheme, i.e., the fine-grained structured sparsity. It can be considered as enabling a certain level of flexibility in the previous (coarsegrained) structured sparsity, thus simultaneously boosting the accuracy of the structured sparsity and facilitating realtime on-device inference.…”

Section: Introductionmentioning

confidence: 99%

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

Niu¹,

Li²,

Ma³

et al. 2021

Preprint

View full text Add to dashboard Cite

It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices, because even the powerful modern mobile devices are considered as "resource-constrained" when executing large-scale DNNs. It necessitates the sparse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy. This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) and that achieves Real-time execution and high accuracy, leveraging fine-grained structured sparse model Inference and compiler optimizations for Mobiles. We start by proposing a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference; and (b) the BCR pruning optimizations for determining pruning hyperparameters and performing weight pruning. We compare GRIM with Alibaba MNN, TVM, TensorFlow-Lite, a sparse implementation based on CSR, PatDNN, and ESE (a representative FPGA inference acceleration framework for RNNs), and achieve up to 14.08× speedup.

show abstract

Efficient Hardware Realization of Convolutional Neural Networks Using Intra-Kernel Regular Pruning

Cited by 15 publications

References 20 publications

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Small Network for Lightweight Task in Computer Vision: A Pruning Method Based on Feature Representation

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

Contact Info

Product

Resources

About