Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores 2015
DOI: 10.1145/2712386.2712387
|View full text |Cite
|
Sign up to set email alerts
|

Energy efficiency and performance frontiers for sparse computations on GPU supercomputers

Abstract: In this paper we unveil some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. To do this, we consider state-of-the-art implementations of the sparse matrix-vector (SpMV) product in libraries like cuSPARSE, MKL, and MAGMA, and their use in the LOBPCG eigen-solver. LOBPCG is chosen as a benchmark for this study as it combines an interesting mix of sparse and dense linear algebra operations with potential for hardware-aware optimizations. Most notably, LOBPCG includ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 34 publications
0
8
0
Order By: Relevance
“…Röhrig-Zöllner et al [19] discuss performance optimization techniques for the block Jacobi-Davidson method to compute a few eigenpairs of large-scale sparse matrices, and report reduced time-to-solution using block methods over single vector counterparts for quantum mechanics problems and PDEs. Finally, Anzt et al [20] describe an SpMM implementation based on the SELLC matrix format, and show that performance improvements in the SpMM kernel can translate into performance improvements in a block eigensolver running on GPUs.…”
Section: Introductionmentioning
confidence: 99%
“…Röhrig-Zöllner et al [19] discuss performance optimization techniques for the block Jacobi-Davidson method to compute a few eigenpairs of large-scale sparse matrices, and report reduced time-to-solution using block methods over single vector counterparts for quantum mechanics problems and PDEs. Finally, Anzt et al [20] describe an SpMM implementation based on the SELLC matrix format, and show that performance improvements in the SpMM kernel can translate into performance improvements in a block eigensolver running on GPUs.…”
Section: Introductionmentioning
confidence: 99%
“…They showed that in terms of GFLOPS/W, the SpMV kernel was better in the Intel Sandy Bridge than in the NVIDIA GPU, but they only considered the CSR format and presented the results with using only one type of sparse matrices (R-MAT). In [3], the authors unveiled some energy efficiency and performance frontiers for sparse computations on GPU-based supercomputers. LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient) was chosen as a benchmark as it combines between sparse and dense linear algebra operations including the SpMV kernel.…”
Section: Related Workmentioning
confidence: 99%
“…As the CG and its preconditioned variant arise as a combination of matrixvector and vector-operations, the optimization for coalescent memory reads boils down to the sparse matrix vector product. There exists extensive work on optimizing storage format and sparse matrix vector performance for GPUs, and in this work we focus on using the CSR, ELL, and SELLP formats, known to provide good performance [4]. To reduce the memory traffic, it is necessary to use algorithm-specific kernels that apply kernel-fusion to the basic linear algebra operations whenever possible [1].…”
Section: Sparse Linear Algebra On Gpusmentioning
confidence: 99%