Proceedings of the 59th ACM/IEEE Design Automation Conference 2022
DOI: 10.1145/3489517.3530588
|View full text |Cite
|
Sign up to set email alerts
|

Shfl-BW

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 3 publications
0
6
0
Order By: Relevance
“…As the synthetic matrices are intended for benchmarking purposes, they have been populated with random values. For each of these inputs sets, different density levels (d ∈ {0.05, 0.10, 0.20, 0.30, 0.50, 0.09}), and pruning configurations of vector-wise [24] (l ∈ {16, 32, 64, 128}) and VENOM [5] (V ∈ {32, 64, 128}) formats were considered.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…As the synthetic matrices are intended for benchmarking purposes, they have been populated with random values. For each of these inputs sets, different density levels (d ∈ {0.05, 0.10, 0.20, 0.30, 0.50, 0.09}), and pruning configurations of vector-wise [24] (l ∈ {16, 32, 64, 128}) and VENOM [5] (V ∈ {32, 64, 128}) formats were considered.…”
Section: Resultsmentioning
confidence: 99%
“…However, in the same way that happened in dense computation [56], there is an emerging trend of template-based implementations for DL routines. A new wave of third-party kernels has surfaced, providing implementations for different sparse formats, such as Shfl-BW [24] and VENOM [5]. Such solutions offer as configurable parameters the whole set of variables described in Figure 1 such as Thread-Block, Warp, and MMA tile shapes, as well as the Batch size.…”
Section: Emerging Trend: Templated Librariesmentioning
confidence: 99%
See 1 more Smart Citation
“…However, achieving substantial gains in throughput or training time on SIMD and systolic array based architectures like GPUs and TPUs remains challenging. Notably, the SIMD nature of GPUs, including the incorporation of Tensor Cores, presents limitations for efficiently accelerating sparse operations (Gale et al 2020;Huang et al 2022). Consequently, achieving benefits from sparse matrix matrix multiplication (SpMM) in the absence of specific sparsity patterns proves difficult, except for extreme sparsity levels of around 99% and higher (Hoefler et al 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Consequently, achieving benefits from sparse matrix matrix multiplication (SpMM) in the absence of specific sparsity patterns proves difficult, except for extreme sparsity levels of around 99% and higher (Hoefler et al 2021). To address this, techniques have emerged to introduce and exploit structure within sparse matrices (Huang et al 2022;Mishra et al 2021;Wang 2020). For instance, NVIDIA's 2:4 sparsity support optimally utilizes tensor cores (Mishra et al 2021).…”
Section: Related Workmentioning
confidence: 99%