A new approach for sparse matrix vector product on NVIDIA GPUs

Heras‐Vázquez, Francisco Javier Las; Fernández, Julio F.; Garzón, Eduardo Garzón

doi:10.1002/cpe.1658

Cited by 103 publications

(87 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The challenges of achieving efficient performance on a GPU architecture may justify the extended effort of custom algorithms developed specifically for the strengths of the platform (Vzquez et al, 2010). However, we maintain the original algorithms of the legacy EISPACK implementation for several reasons:…”

Section: Methodsmentioning

confidence: 99%

Towards Ad-Hoc GPU Acceleration Of Parallel Eigensystem Computations

Garba

González–Vélez

2011

ECMS 2011 Proceedings Edited By: T. Burczynski, J. Kolodziej, A. Byrski, M. Carvalho

View full text Add to dashboard Cite

This paper explores the early implementation of highperformance routines for the solution of multiple large Hermitian eigenvector and eigenvalue systems on a Graphics Processing Unit (GPU). We report a performance increase of up to two orders of magnitude over the original EISPACK routines with a NVIDIA Tesla C2050 GPU, potentially allowing an order of magnitude increase in the complexity or resolution of a neutron scattering modeling application.

show abstract

Section: Methodsmentioning

confidence: 99%

Towards Ad-Hoc GPU Acceleration Of Parallel Eigensystem Computations

Garba

González–Vélez

2011

ECMS 2011 Proceedings Edited By: T. Burczynski, J. Kolodziej, A. Byrski, M. Carvalho

View full text Add to dashboard Cite

show abstract

“…Both schemes aim to reduce the memory footprint by explicitly storing only the non-zero elements, though the ELLPACK format may store some zero elements for padding all rows to the same length; see Figure 3.3 and Table 3.3 for the evoked overhead. While in general this incurs some additional storage cost, the aligned structure allows for more efficient hardware use when targeting streaming processors like the GPUs [108,130].…”

Section: Basic Implementation Of the Cg Methodsmentioning

confidence: 99%

“…In those cases, the associated memory and computational overheads may result in poor performance, despite that coalesced memory access is highly beneficial for streaming processors like the GPUs. ELLR-T [130,131] is a subtle variant of ELLPACK that addresses this problem while maintaining the coalesced memory access pattern of the original layout. In particular, it reduces useless computations with zeros and improves thread load balancing, often resulting in superior performance.…”

Section: Graphics Acceleratorsmentioning

confidence: 99%

“…, en el que se incorporan versiones optimizadas del CG, principalmente centradas en la operación SpMV sobre las mismas arquitecturas. En concreto, para las arquitecturas multinúcleo se utilizaron los formatos CSR, BCSR y CSB [192,193,194,195,196], mientras que para las GPUs se utilizaron los formatos ELLPACK, ELLR_T y SELL-P [197,198,199,200,201], y también se incluyó el "fusionado de kernels CUDA". Además, en el estudio se utilizó aritmética de DP, aunque como complemento final, también se comprobó el uso de SP para la GPU (Kepler) y un procesador de propósito general (Intel Bridge).…”

Section: Análisis De Arquitecturas Paralelasunclassified

“…Section 4.4 offers a short introduction to two GPU implementations of the sparse matrixvector product, using the popular compressed storage row (CSR) format [144], and describes our initial GPU code for the CG method built on top of this operation and NVIDIA's CUBLAS library [150]. While there exist alternative algorithms and storage layouts for the sparse matrix-vector product that, in general, render higher performance on the GPU [141,155], we note here that our energy-saving techniques are orthogonal to the actual implementation of the matrix-vector product kernel. Indeed, we expect that a more efficient implementation of this operation will shift part of the relative cost towards other kernels of the CG method, so that by Amdahl's law, the gains could be even higher.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations