2015
DOI: 10.1007/978-3-662-48096-0_46
|View full text |Cite
|
Sign up to set email alerts
|

High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications

Abstract: Abstract. Leveraging optimization techniques (e.g., register blocking and double buffering) introduced in the context of KBLAS, a Level 2 BLAS high performance library on GPUs, the authors implement dense matrix-vector multiplications within a sparse-block structure. While these optimizations are important for high performance dense kernel executions, they are even more critical when dealing with sparse linear algebra operations. The most time-consuming phase of many multicomponent applications, such as models… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 10 publications
(21 citation statements)
references
References 17 publications
0
21
0
Order By: Relevance
“…KAUST BLAS (KBLAS) is an open‐source library that provides highly optimized implementations for a subset of BLAS routines on NVIDIA GPUs as well as x86 architectures . In particular, the authors have already demonstrated significant performance gains for IP TRSM and TRMM against cuBLAS IP and MAGMA OOP implementations on a single NVIDIA GPU .…”
Section: Related Workmentioning
confidence: 99%
“…KAUST BLAS (KBLAS) is an open‐source library that provides highly optimized implementations for a subset of BLAS routines on NVIDIA GPUs as well as x86 architectures . In particular, the authors have already demonstrated significant performance gains for IP TRSM and TRMM against cuBLAS IP and MAGMA OOP implementations on a single NVIDIA GPU .…”
Section: Related Workmentioning
confidence: 99%
“…However, this distribution, which has been used in [2], does not work well for all matrices, unless the row lengths are balanced, as proposed in Section 5.5. Block rows of the matrix, preferably reordered according to their lengths, will be distributed among GPUs in a 1D cyclic manner.…”
Section: Multi-gpu Kernelsmentioning
confidence: 99%
“…Such matrices are not necessarily block-sparse; however, we are interested in their structures, as inherited from spatial discretization. KSPARSE using BSR format [2]. Such approach enables us to test the performance of the proposed kernels against a wide range of sparsity patterns.…”
Section: System Setupmentioning
confidence: 99%
See 2 more Smart Citations