2016
DOI: 10.1155/2016/4596943
|View full text |Cite
|
Sign up to set email alerts
|

Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU

Abstract: Sparse matrix-vector multiplication (SpMV) is an important operation in computational science and needs be accelerated because it often represents the dominant cost in many widely used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different numbers of rows to each thread block and executes different optimization implementations on the basis of the number of rows it invo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…Based on the CSR storage format, Bell and Garland proposed two classic parallel algorithms: CSR-Scalar [6] and CSR-Vector [6]. Lu et al [7]suggested filling the CSR array to optimize CSR-Scalar, achieving a 30% improvement in memory access performance; Dehnavi et al [8] put forward a prefetching CSR method that divides the non-zero elements of the matrix into blocks of the same size and allocates them to a GPU-like accelerator for computation; Greathouse and Daga [9]came up with the CSR-Adaptive algorithm, which dynamically selects between the CSR-Stream algorithm and the CSR-Vector algorithm according to the number of non-zero elements in each row, and uses effective reduction techniques for performance improvement; Gao et al [10]presented a PCSR algorithm to enhance the performance of the kernel by fully merging memory access to the CSR array, and made optimization on the basis of the algorithm, thus proposing the IPCSR algorithm, which reduces two kernels in PCSR to one while maintaining merged accesses to CSR arrays, saving the cost of loading global memory.…”
Section: Spmv Algorithm Based On Csrmentioning
confidence: 99%
“…Based on the CSR storage format, Bell and Garland proposed two classic parallel algorithms: CSR-Scalar [6] and CSR-Vector [6]. Lu et al [7]suggested filling the CSR array to optimize CSR-Scalar, achieving a 30% improvement in memory access performance; Dehnavi et al [8] put forward a prefetching CSR method that divides the non-zero elements of the matrix into blocks of the same size and allocates them to a GPU-like accelerator for computation; Greathouse and Daga [9]came up with the CSR-Adaptive algorithm, which dynamically selects between the CSR-Stream algorithm and the CSR-Vector algorithm according to the number of non-zero elements in each row, and uses effective reduction techniques for performance improvement; Gao et al [10]presented a PCSR algorithm to enhance the performance of the kernel by fully merging memory access to the CSR array, and made optimization on the basis of the algorithm, thus proposing the IPCSR algorithm, which reduces two kernels in PCSR to one while maintaining merged accesses to CSR arrays, saving the cost of loading global memory.…”
Section: Spmv Algorithm Based On Csrmentioning
confidence: 99%
“…Perfect-CSR. PCSR (Gao et al, 2016) consists of two main stages. The first stage launches as many blocks as the number of nonzero entries divided by the block dimensions (i.e., one thread per nonzero entry).…”
Section: Csr-adaptivementioning
confidence: 99%
“…The first step in the empirical evaluation of our proposal is the execution of all the selected variants of SpMV and a preliminary assessment of results. Our initial experiment compares the runtimes of the cuSparse CsrMV routine with the row-based and merge-path algorithms (which we refer to as cuS_RB and cuS_merge , respectively); the CUSP implementation of CSR-Vector ( cusp_vect ); an implementation of Liu and Vinter (2015) ( bhSparse ) published by the authors in their GitHub repository 4 ; and our own implementations of Liu and Schmidt (2018) ( light ), Gao et al (2016) ( pcsr ), and Greathouse and Daga (2014) ( adaptive ), based on codes found online. 5…”
Section: Experimental Evaluationmentioning
confidence: 99%