Proceedings of the 2007 ACM/IEEE Conference on Supercomputing 2007
DOI: 10.1145/1362622.1362674
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Abstract: We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) -one of the most heavily used kernels in scientific computing -across a broad spec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

10
285
0
2

Year Published

2009
2009
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 416 publications
(297 citation statements)
references
References 16 publications
10
285
0
2
Order By: Relevance
“…On co-processors composed of a large amount of lightweight single instruction, multiple data (SIMD) units, the problem can heavily degrade performance of SpMV operation. Even though many strategies, such as vectorization [1,2,13], data streaming [14], memory coalescing [33], static or dynamic binning [14,15], Dynamic Parallelism [15] and dynamic row distribution [19], have been proposed for the row block method, it is still impossible to achieve nearly perfect load balancing in general sense, simply since row sizes are irregular and unpredictable.…”
Section: Csr Format and Csr-based Spmv Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…On co-processors composed of a large amount of lightweight single instruction, multiple data (SIMD) units, the problem can heavily degrade performance of SpMV operation. Even though many strategies, such as vectorization [1,2,13], data streaming [14], memory coalescing [33], static or dynamic binning [14,15], Dynamic Parallelism [15] and dynamic row distribution [19], have been proposed for the row block method, it is still impossible to achieve nearly perfect load balancing in general sense, simply since row sizes are irregular and unpredictable.…”
Section: Csr Format and Csr-based Spmv Algorithmsmentioning
confidence: 99%
“…Thereofore, improving performance of SpMV using the most widely supported CSR format has also gained plenty of attention [1,2,13,14,15,16,17,18]. Most of the related work [1,2,13,14,15,19] has focused on improving row block method for the CSR-based SpMV.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The resulting GPU algorithm has been tested with the sparse matrix set used in studies of multi-core [6] and GPU [7] matrix-vector product performance. Figure 5 shows that the convergence of the default SPAI (with sparsity pattern identical to A T ) is highly competitive with the default CUSP-Bridson preconditioner for the GMRES linear solver.…”
Section: Figmentioning
confidence: 99%
“…Although platform specific tuning is known to give significant efficiency improvements (see the study of Williams et al [15]), we chose not to apply it here. In this way we keep RSB algorithms general and the code portable, thus retaining the possibility of further optimizations.…”
Section: Introduction and Related Literaturementioning
confidence: 99%