2010
DOI: 10.1002/cpe.1658
|View full text |Cite
|
Sign up to set email alerts
|

A new approach for sparse matrix vector product on NVIDIA GPUs

Abstract: SUMMARYThe sparse matrix vector product (SpMV) is a key operation in engineering and scientific computing and, hence, it has been subjected to intense research for a long time. The irregular computations involved in SpMV make its optimization challenging. Therefore, enormous effort has been devoted to devise data formats to store the sparse matrix with the ultimate aim of maximizing the performance. Graphics Processing Units (GPUs) have recently emerged as platforms that yield outstanding acceleration factors.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
82
0
1

Year Published

2011
2011
2014
2014

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 103 publications
(87 citation statements)
references
References 13 publications
1
82
0
1
Order By: Relevance
“…The challenges of achieving efficient performance on a GPU architecture may justify the extended effort of custom algorithms developed specifically for the strengths of the platform (Vzquez et al, 2010). However, we maintain the original algorithms of the legacy EISPACK implementation for several reasons:…”
Section: Methodsmentioning
confidence: 99%
“…The challenges of achieving efficient performance on a GPU architecture may justify the extended effort of custom algorithms developed specifically for the strengths of the platform (Vzquez et al, 2010). However, we maintain the original algorithms of the legacy EISPACK implementation for several reasons:…”
Section: Methodsmentioning
confidence: 99%
“…Both schemes aim to reduce the memory footprint by explicitly storing only the non-zero elements, though the ELLPACK format may store some zero elements for padding all rows to the same length; see Figure 3.3 and Table 3.3 for the evoked overhead. While in general this incurs some additional storage cost, the aligned structure allows for more efficient hardware use when targeting streaming processors like the GPUs [108,130].…”
Section: Basic Implementation Of the Cg Methodsmentioning
confidence: 99%
“…In those cases, the associated memory and computational overheads may result in poor performance, despite that coalesced memory access is highly beneficial for streaming processors like the GPUs. ELLR-T [130,131] is a subtle variant of ELLPACK that addresses this problem while maintaining the coalesced memory access pattern of the original layout. In particular, it reduces useless computations with zeros and improves thread load balancing, often resulting in superior performance.…”
Section: Graphics Acceleratorsmentioning
confidence: 99%
See 2 more Smart Citations