2020
DOI: 10.48550/arxiv.2002.04997
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PCNN: Pattern-based Fine-Grained Regular Pruning towards Optimizing CNN Accelerators

Abstract: Weight pruning is a powerful technique to realize model compression. We propose PCNN, a fine-grained regular 1D pruning method. A novel index format called Sparsity Pattern Mask (SPM) is presented to encode the sparsity in PCNN.Leveraging SPM with limited pruning patterns and non-zero sequences with equal length, PCNN can be efficiently employed in hardware. Evaluated on VGG-16 and ResNet-18, our PCNN achieves the compression rate up to 8.4× with only 0.2% accuracy loss. We also implement a pattern-aware archi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…In fine-grained pruning, the set of weights to be pruned can be chosen arbitrarily, it can achieve a very high compression ratio on CNN [115], RNN [92], LSTM [112] and Transformers [51] without hurting accuracy. • Pattern-based pruning is a special kind of fine-grained pruning which has better hardware acceleration with compiler optimization [203,216,279]. It assigns a fixed set of masks to each 3×3 kernel.…”
Section: Granularitymentioning
confidence: 99%
“…In fine-grained pruning, the set of weights to be pruned can be chosen arbitrarily, it can achieve a very high compression ratio on CNN [115], RNN [92], LSTM [112] and Transformers [51] without hurting accuracy. • Pattern-based pruning is a special kind of fine-grained pruning which has better hardware acceleration with compiler optimization [203,216,279]. It assigns a fixed set of masks to each 3×3 kernel.…”
Section: Granularitymentioning
confidence: 99%
“…During the hardware deployment process, using standard convolution in the conventional CNN model can considerably reduce memory access through mature data reuse technology [ 21 , 22 ]. At present, the mainstream CNN model pruning methods are usually divided into three types: non-structured [ 12 , 13 , 14 , 15 ], structured [ 16 , 17 , 18 , 19 , 20 , 23 ], and pattern [ 24 , 25 , 26 ] pruning, as shown in Figure 1 .…”
Section: Introductionmentioning
confidence: 99%
“…To combine the advantages of both methods, pattern pruning was proposed [ 24 , 25 , 26 ]. Pattern pruning aims to find an intermediate sparse dimension to combine the high accuracy of small-grained pruning models with the high regularity of large-grained pruning models.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, unstructured sparsity struggles to take advantage of vectorprocessing architectures such as SIMD and poorly utilizes memory buses, which increases latency due to dependent sequences of reads (Nvidia, 2020). Compared with unstructured sparsity, structured sparsity is more friendly to hardware, especially for block pruning , kernel shape sparsity (Tan et al, 2020) or channel and filter pruning Wen et al, 2016). Although structured sparsity can speed up DNNs on commodity hardware, it hurts model performance more significantly than unstructured fine-grained sparsity.…”
Section: Introductionmentioning
confidence: 99%