SC14: International Conference for High Performance Computing, Networking, Storage and Analysis 2014
DOI: 10.1109/sc.2014.69
|View full text |Cite
|
Sign up to set email alerts
|

Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 123 publications
(90 citation statements)
references
References 10 publications
0
90
0
Order By: Relevance
“…Their main bottlenecks were the limited size of shared memory, an expensive global scan operation, and random non-coalesced memory accesses. 2 Patidar [29] proposed two methods with a particular focus on a large number of buckets (more than 4k): one based on heavy usage of shared-memory atomic operations (to compute block level histogram and intra-bucket orders), and the other by iterative usage of basic binary split for each bucket (or groups of buckets). Patidar used a combination of these methods in a hierarchical way to get his best results.…”
Section: Multisplit and Histogramsmentioning
confidence: 99%
“…Their main bottlenecks were the limited size of shared memory, an expensive global scan operation, and random non-coalesced memory accesses. 2 Patidar [29] proposed two methods with a particular focus on a large number of buckets (more than 4k): one based on heavy usage of shared-memory atomic operations (to compute block level histogram and intra-bucket orders), and the other by iterative usage of basic binary split for each bucket (or groups of buckets). Patidar used a combination of these methods in a hierarchical way to get his best results.…”
Section: Multisplit and Histogramsmentioning
confidence: 99%
“…On co-processors composed of a large amount of lightweight single instruction, multiple data (SIMD) units, the problem can heavily degrade performance of SpMV operation. Even though many strategies, such as vectorization [1,2,13], data streaming [14], memory coalescing [33], static or dynamic binning [14,15], Dynamic Parallelism [15] and dynamic row distribution [19], have been proposed for the row block method, it is still impossible to achieve nearly perfect load balancing in general sense, simply since row sizes are irregular and unpredictable.…”
Section: Csr Format and Csr-based Spmv Algorithmsmentioning
confidence: 99%
“…Thereofore, improving performance of SpMV using the most widely supported CSR format has also gained plenty of attention [1,2,13,14,15,16,17,18]. Most of the related work [1,2,13,14,15,19] has focused on improving row block method for the CSR-based SpMV.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The SELL-C-σ format has been improved and optimized for GPUs by Antz et al [3], by introducing some zero padding to satisfy the memory constraints of the GPU architecture, hence called the SELL-P format. Ashari et al [4] proposed an adaptive algorithm for SpMV using the CSR format (called ACSR), where additional metadata are used with the standard CSR format that help achieve better GPU utilization. ACSR is mainly proposed for adaptive graph applications, where the structure of the graph adjacency matrix changes frequently, thus making the preprocessing step a serious bottleneck.…”
Section: Related Workmentioning
confidence: 99%