2019
DOI: 10.1177/1094342019886628
|View full text |Cite
|
Sign up to set email alerts
|

Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms

Abstract: Sparse matrix–vector multiplication (SpMV) kernel dominates the computing cost in numerous applications. Most of the existing studies dedicated to improving this kernel have been targeting just one type of processing units, mainly multicore CPUs or graphics processing units (GPUs), and have not explored the potential of the recent, rapidly emerging, CPU-GPU heterogeneous platforms. To take full advantage of these heterogeneous systems, the input sparse matrix has to be partitioned on different available proces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 32 publications
(68 reference statements)
0
5
0
Order By: Relevance
“…SpMV in Commodity Systems. Numerous prior works propose optimized SpMV algorithms for CPUs [5, 37, 59, 60, 62, 63, 108, 136, 165, 171, 172, 182, 193, 204, 209, 235-237, 245, 247, 250, 251, 255, 256, 274], GPUs [18,27,48,61,70,91,107,162,203,227,231,233,243,253,260,261,265], heterogeneous CPU-GPU systems [10,19,34,116,117,202,262,264], and distributed CPU systems [24,28,38,40,85,125,150,161,183,196,201,242]. Optimized SpMV kernels for processorcentric CPU and GPU systems exploit the shared memory model of these systems and data locality in deep cache hierarchies.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…SpMV in Commodity Systems. Numerous prior works propose optimized SpMV algorithms for CPUs [5, 37, 59, 60, 62, 63, 108, 136, 165, 171, 172, 182, 193, 204, 209, 235-237, 245, 247, 250, 251, 255, 256, 274], GPUs [18,27,48,61,70,91,107,162,203,227,231,233,243,253,260,261,265], heterogeneous CPU-GPU systems [10,19,34,116,117,202,262,264], and distributed CPU systems [24,28,38,40,85,125,150,161,183,196,201,242]. Optimized SpMV kernels for processorcentric CPU and GPU systems exploit the shared memory model of these systems and data locality in deep cache hierarchies.…”
Section: Related Workmentioning
confidence: 99%
“…2D Partitioning Techniques. We analyze scalability with the number of DPUs for the 2D partitioning techniques.Figures 18,19 and 20 compare the performance of the equally-sized, equally-wide and variable-sized schemes, respectively, using the COO format and the int32 data type, as the number of DPUs increases.Fig. 19.…”
mentioning
confidence: 99%
“…In recent years, many researchers have focused on fully exploiting multiple different types of computing devices in heterogeneous platforms to cooperatively accelerate the execution of specific applications, such as minimal hitting set enumeration problem, 11 protein sequence alignment algorithms, 12 sparse matrix‐vector multiplication, 13 solidification modeling, 14 and high‐resolution image restoration algorithms 15 . The above research can fully utilize both multi‐core CPUs and many‐core GPUs/MICs to accelerate the execution of specified computational tasks, and the experimental results show that the performance is significantly improved compared with utilizing the CPUs, GPUs, or MICs alone.…”
Section: Introductionmentioning
confidence: 99%
“…In order to adapt the underlying architecture of hardware accelerators, researchers focus on the reconstruction of SpMV algorithm to improve the computing performance of hardware accelerators. Such as Intel Xeon Phi 10,11 , general purpose Graphics Processor (GPGPU) 12,13 , advanced micro devices (AMD) 14,15 , field programmable gate array (FPGA) 16,17 and so on. With the advent of Sunway Taihulight supercomputer 18 , it is equipped with SW26010P many-core processor unique hardware architecture and has strong parallel computing capabilities.…”
Section: Introductionmentioning
confidence: 99%