PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators

Tan, Zhanhong; Song, Jiebo; Ma, Xiaolong; Tan, Sia-Huat; Chen, Hongyang; Miao, Yuanqing; Wu, Yifu; Ye, Shaokai; Wang, Yanzhi; Li, Dehui; Ma, Kaisheng

doi:10.48550/arxiv.2002.04997

Cited by 3 publications

(5 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In fine-grained pruning, the set of weights to be pruned can be chosen arbitrarily, it can achieve a very high compression ratio on CNN [115], RNN [92], LSTM [112] and Transformers [51] without hurting accuracy. • Pattern-based pruning is a special kind of fine-grained pruning which has better hardware acceleration with compiler optimization [203,216,279]. It assigns a fixed set of masks to each 3×3 kernel.…”

Section: Granularitymentioning

confidence: 99%

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Cai

Lin

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable cost of computational complexity, which greatly hinders their applications in many resource-constrained devices, such as mobile phones and Internet of Things (IoT) devices. Therefore, methods and techniques that are able to lift the efficiency bottleneck while preserving the high accuracy of DNNs are in great demand to enable numerous edge AI applications. This article provides an overview of efficient deep learning methods, systems, and applications. We start from introducing popular model compression methods, including pruning, factorization, quantization, as well as compact model design. To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization. We then cover efficient on-device training to enable user customization based on the local data on mobile devices. Apart from general acceleration techniques, we also showcase several task-specific accelerations for point cloud, video, and natural language processing by exploiting their spatial sparsity and temporal/token redundancy. Finally, to support all these algorithmic advancements, we introduce the efficient deep learning system design from both software and hardware perspectives.

show abstract

Section: Granularitymentioning

confidence: 99%

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Cai

Lin

et al. 2022

ACM Trans. Des. Autom. Electron. Syst.

View full text Add to dashboard Cite

show abstract

“…During the hardware deployment process, using standard convolution in the conventional CNN model can considerably reduce memory access through mature data reuse technology [ 21 , 22 ]. At present, the mainstream CNN model pruning methods are usually divided into three types: non-structured [ 12 , 13 , 14 , 15 ], structured [ 16 , 17 , 18 , 19 , 20 , 23 ], and pattern [ 24 , 25 , 26 ] pruning, as shown in Figure 1 .…”

Section: Introductionmentioning

confidence: 99%

“…To combine the advantages of both methods, pattern pruning was proposed [ 24 , 25 , 26 ]. Pattern pruning aims to find an intermediate sparse dimension to combine the high accuracy of small-grained pruning models with the high regularity of large-grained pruning models.…”

Section: Introductionmentioning

confidence: 99%

“…The object of pattern pruning is also the weights, but it selects some specific convolutional kernel pruning patterns by analyzing the importance of each weight, and pruning is performed strictly according to these patterns. Tan et al [ 26 ] achieved lossless pruning with a 60% pruning rate according to this approach. Actually, pattern pruning only reduces the number of convolutional kernels’ pruning patterns for nonstructured pruning and guarantees the same number of residual weights for each convolutional kernel, solving the problem of unbalanced workload between computational modules of different channels during hardware deployment.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

Sui

Zhi

et al. 2023

Sensors

View full text Add to dashboard Cite

To address the problems of large storage requirements, computational pressure, untimely data supply of off-chip memory, and low computational efficiency during hardware deployment due to the large number of convolutional neural network (CNN) parameters, we developed an innovative hardware-friendly CNN pruning method called KRP, which prunes the convolutional kernel on a row scale. A new retraining method based on LR tracking was used to obtain a CNN model with both a high pruning rate and accuracy. Furthermore, we designed a high-performance convolutional computation module on the FPGA platform to help deploy KRP pruning models. The results of comparative experiments on CNNs such as VGG and ResNet showed that KRP has higher accuracy than most pruning methods. At the same time, the KRP method, together with the GSNQ quantization method developed in our previous study, forms a high-precision hardware-friendly network compression framework that can achieve “lossless” CNN compression with a 27× reduction in network model storage. The results of the comparative experiments on the FPGA showed that the KRP pruning method not only requires much less storage space, but also helps to reduce the on-chip hardware resource consumption by more than half and effectively improves the parallelism of the model in FPGAs with a strong hardware-friendly feature. This study provides more ideas for the application of CNNs in the field of edge computing.

show abstract

“…However, unstructured sparsity struggles to take advantage of vectorprocessing architectures such as SIMD and poorly utilizes memory buses, which increases latency due to dependent sequences of reads (Nvidia, 2020). Compared with unstructured sparsity, structured sparsity is more friendly to hardware, especially for block pruning , kernel shape sparsity (Tan et al, 2020) or channel and filter pruning Wen et al, 2016). Although structured sparsity can speed up DNNs on commodity hardware, it hurts model performance more significantly than unstructured fine-grained sparsity.…”

Section: Introductionmentioning

confidence: 99%

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Zhou¹,

Ma²,

Zhu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarsegrained sparsity which prunes blocks of sub-networks of a neural network. Finegrained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives limited speed gains. On the other hand, coarse-grained sparsity cannot concurrently achieve both apparent acceleration on modern GPUs and decent performance. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on specifically designed GPUs. Specifically, a 2 : 4 sparse network could achieve 2× speed-up without performance drop on Nvidia A100 GPUs. Furthermore, we propose a novel and effective ingredient, sparse-refined straightthrough estimator (SR-STE), to alleviate the negative influence of the approximated gradients computed by vanilla STE during optimization. We also define a metric, Sparse Architecture Divergence (SAD), to measure the sparse network's topology change during the training process. Finally, We justify SR-STE's advantages with SAD and demonstrate the effectiveness of SR-STE by performing comprehensive experiments on various tasks. Source codes and models are available at https://github.com/NM-sparsity/NM-sparsity.

show abstract

PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators

Cited by 3 publications

References 15 publications

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

A Hardware-Friendly High-Precision CNN Pruning Method and Its FPGA Implementation

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Contact Info

Product

Resources

About