Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing 2018
DOI: 10.1145/3208040.3208062
|View full text |Cite
|
Sign up to set email alerts
|

Efficient sparse-matrix multi-vector product on GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
36
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(37 citation statements)
references
References 25 publications
1
36
0
Order By: Relevance
“…3b. The dense blocks are stored as dense matrices in row-major order with empty cells filled with zero, and the sparse block is stored in Compressed Sparse Row (CSR) format [22]. We took inspiration from the data reorganization idea proposed in [25] and proposed a new algorithm for extracting dense blocks in a sparse matrix.…”
Section: Overview Of Optprunementioning
confidence: 99%
See 2 more Smart Citations
“…3b. The dense blocks are stored as dense matrices in row-major order with empty cells filled with zero, and the sparse block is stored in Compressed Sparse Row (CSR) format [22]. We took inspiration from the data reorganization idea proposed in [25] and proposed a new algorithm for extracting dense blocks in a sparse matrix.…”
Section: Overview Of Optprunementioning
confidence: 99%
“…As explained in the background section, sparse convolutions can be implemented as SpMM. Although previous works have studied SpMM on GPUs [22,23,25], their optimization techniques mainly target large sparse matrices with at least 10,000 rows and columns that are found in scientific computing applications, and they can not deliver good performance for sparse convolutions where the number of convolution kernels is usually smaller than 1000. In fact, we adopted a state-of-the-art implementation of SpMM from [25] for sparse convolution with real-world pruned models from [46], and we found that the sparse convolutions do not run much faster (and can even be slower) compared with the original dense convolutions implemented as GEMM.…”
Section: Implementing Sparse Convolutions With Gemmmentioning
confidence: 99%
See 1 more Smart Citation
“…Hong and et al proposed new sparse matrix format and algorithm for SpMM named Row-Segmented-SpMM (RS-SpMM) [9]. The sparse matrix is divided into two groups.…”
Section: Related Work a Spmm For Gpumentioning
confidence: 99%
“…These sparse formats, that are suitable for cache-aware CPU platforms, usually provide poor performance on GPUs. For this, new formats are developed to allow an efficient implementation of a sparse matrix-vector product on GPUs [17,68]. A sparse format included in cuSPARSE, specifically designed for the use on GPU is HYB.…”
Section: Cuda Softwarementioning
confidence: 99%