2015
DOI: 10.1137/130948811
|View full text |Cite
|
Sign up to set email alerts
|

GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging

Abstract: We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using subwarps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
64
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
7
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 75 publications
(64 citation statements)
references
References 31 publications
0
64
0
Order By: Relevance
“…We derive the scattering map by automated segmentation of the µCT data and assigning known scattering coefficients of several tissue types (lung, bone, skin, fat, and remaining soft tissue) 24 . Subsequently, we reconstruct an absorption map from the optical raw data which is particularly important for well-perfused organs such as the heart and the liver 13,20 .…”
Section: Discussionmentioning
confidence: 99%
“…We derive the scattering map by automated segmentation of the µCT data and assigning known scattering coefficients of several tissue types (lung, bone, skin, fat, and remaining soft tissue) 24 . Subsequently, we reconstruct an absorption map from the optical raw data which is particularly important for well-perfused organs such as the heart and the liver 13,20 .…”
Section: Discussionmentioning
confidence: 99%
“…There has been a flurry of activity in developing algorithms and implementations of SpGEMM for Graphics Processing Units (GPUs). Among those, the algorithm of Gremse et al [26] uses the row-wise formulation of SpGEMM. By contrast, Dalton et al [18] uses the data-parallel ESC (expansion, sorting, and contraction) formulation, which is based on outer products.…”
Section: Notation Letmentioning
confidence: 99%
“…The single precision and double precision absolute performance of the SpGEMM algorithms that compute C = A 2 are shown in Figures 8 and 9, respectively. Four GPU methods from CUSP v0.4.0, cuSPARSE v6.5, RMerge [16] and bhSPARSE are evaluated on three GPUs: nVidia GeForce GTX Titan Black, nVidia GeForce GTX 980 and AMD Radeon R9 290X. One CPU method in Intel MKL v11.0 is evaluated on Intel Xeon E5-2630 CPU.…”
Section: Performance Comparison For Matrix Squaringmentioning
confidence: 99%
“…Previous GPU SpGEMM methods [2,12,13,14,15,16] have proposed a few solutions for the above problems and demonstrated relatively good time and space complexity. However, the experimental results showed that they either only work best for fairly regular sparse matrices [12,13,16], or bring extra high memory overhead for matrices with some specific sparsity structures [2,14,15]. Moreover, in the usual sense, none of these methods can constantly outperform well optimized SpGEMM approach [17] for multicore CPUs.…”
Section: Introductionmentioning
confidence: 99%