Proceedings of the 44th Annual International Symposium on Computer Architecture 2017
DOI: 10.1145/3079856.3080215
|View full text |Cite
|
Sign up to set email alerts
|

Scalpel

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
26
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 169 publications
(26 citation statements)
references
References 22 publications
0
26
0
Order By: Relevance
“…Comparison of unstructured pruning applied to FPGA and GPGPU/CPU is quite challenging, since the latter platforms cannot directly benefit from DL models compression unless a dedicated data structuring scheme is implemented [39,40,41,42]. This scheme is essential to take advantage of sparse vector-matrix and matrix-matrix multiplication operations, which are much more efficient than their dense counterparts, provided the data is prepared properly.…”
Section: Discussionmentioning
confidence: 99%
“…Comparison of unstructured pruning applied to FPGA and GPGPU/CPU is quite challenging, since the latter platforms cannot directly benefit from DL models compression unless a dedicated data structuring scheme is implemented [39,40,41,42]. This scheme is essential to take advantage of sparse vector-matrix and matrix-matrix multiplication operations, which are much more efficient than their dense counterparts, provided the data is prepared properly.…”
Section: Discussionmentioning
confidence: 99%
“…By further coupling pruning with quantization and efficient coding, in a scheme called Deep Compression, they achieved up to 49x size reduction [26]. However, deploying pruned models on highly-parallel architectures has proven problematic due to storage overhead and irregular memory access patterns of sparse matrix multiplication [65,74].…”
Section: Weight Pruningmentioning
confidence: 99%
“…Sredojevic et al [65] have proposed an algorithmic way of inducing regularity in sparse networks. Yu et al [74] have developed a hardware-aware pruning method called Scalpel, which matches the coarseness of pruning to the parallelism of underlying hardware. Our approach to packing is based on Scalpel, but applied to binarized models and using CPU bitwidth as packing granularity, while also permuting layer inputs to improve packing opportunities.…”
Section: Weight Pruningmentioning
confidence: 99%
See 2 more Smart Citations