Proceedings of the International Conference on Parallel Architectures and Compilation Techniques 2022
DOI: 10.1145/3559009.3569691
|View full text |Cite
|
Sign up to set email alerts
|

Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM Routine on Ampere GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Vector-wise pruning can accelerate sparse routines on GPUs. However, if the vector length is greater than 8, it can significantly reduce the accuracy [4,5,25]. The results demonstrate that the V:N:M format occupies an intermediate position between unstructured and vector-wise pruning.…”
Section: Energy Evaluation Of V:n:mmentioning
confidence: 95%
See 2 more Smart Citations
“…Vector-wise pruning can accelerate sparse routines on GPUs. However, if the vector length is greater than 8, it can significantly reduce the accuracy [4,5,25]. The results demonstrate that the V:N:M format occupies an intermediate position between unstructured and vector-wise pruning.…”
Section: Energy Evaluation Of V:n:mmentioning
confidence: 95%
“…In this context, cuSparseLt SpMM implementation is the reference library to exploit the 2:4 format on SPTCs. Since there are no SpMM GPU implementations for arbitrary N:M sparsity levels, we have considered in the evaluation the following third-party libraries that support half-precision: Sputnik [11], and CLASP [4] which extends vectorSparse [5] to the latest generations of NVIDIA GPU architectures. While [11] has been designed for non-structured sparse matrices, [4] is focused on semi-structured sparse input matrices following the column-vector sparse format, which supports vector lengths đť‘™ = 2, 4 and 8.…”
Section: Comparison With Existing Dense and Sparse Librariesmentioning
confidence: 99%
See 1 more Smart Citation
“…Sparse input matrices are inherently shaped by the pruning algorithms, which can generate highly irregular sparse matrices [16]. This irregularity, however, can significantly undermine performance due to inefficient hardware utilization [4]. Therefore, a new trend of semi-structured pruning techniques, which aims to find trade-offs between performance and accuracy, can yield quite structured patterns that offer better performance, but little to no room for tuning their representation [29].…”
Section: Introductionmentioning
confidence: 99%
“…Problem 2: Poor generalization to new input problems and other platforms. The daunting task of crafting efficient kernels for sparse computation in DL has spurred the proliferation of specialized kernels tailored to address specific input problem shapes and hardware architectures [4], [7], [16]. This limitation recognizes the inherent difficulty of preserving the performance across all conceivable scenarios.…”
Section: Introductionmentioning
confidence: 99%