Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Syst 2019
DOI: 10.1145/3297858.3304028
|View full text |Cite
|
Sign up to set email alerts
|

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations

Abstract: This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., 4x) due to the increased density of nonzeros in the resulting packed filter matrix. In combining columns, for each row, all filter weights but one with the largest magnitude are pruned. We retra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
86
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 125 publications
(86 citation statements)
references
References 63 publications
0
86
0
Order By: Relevance
“…Unlike previous work, column combining is a new pruning method which allows for sparse CNN layers, but requires that the remaining sparse weights can be packed into a denser format when deployed in hardware [27]. In our proposed training pipeline, we use column combining in addition to weight and data quantization as discussed in the previous section, in order to achieve efficient sparse CNN inference.…”
Section: Weight Pruningmentioning
confidence: 99%
See 4 more Smart Citations
“…Unlike previous work, column combining is a new pruning method which allows for sparse CNN layers, but requires that the remaining sparse weights can be packed into a denser format when deployed in hardware [27]. In our proposed training pipeline, we use column combining in addition to weight and data quantization as discussed in the previous section, in order to achieve efficient sparse CNN inference.…”
Section: Weight Pruningmentioning
confidence: 99%
“…--8 Rows Figure 3: A pointwise convolution layer (left) with four channels per group resulting from weight pruning training for column combining [27]. After combining columns in the filter matrix (left), each group of four channels (shown in cream and green) are reduced into a single column (right).…”
Section: Layer As Stored In Systolic Arraymentioning
confidence: 99%
See 3 more Smart Citations