Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Tanabe, Noboru; Ogawa, Yuuka; Takata, Masami; Joe, Kazuki

doi:10.1109/pdp.2011.92

Cited by 4 publications

(9 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 7 shows the effective bandwidth of vector accesses during the execution of sparse matrices-vector double precision multiplication with two kinds of sparse matrix storage formats. The storage formats are the CRS and our format for GPUs [5]. As far as this experiment, we do not observe big difference in the two matrix storage formats.…”

Section: Sparse Matrices-vector Multiplication Access Bandwidthmentioning

confidence: 56%

“…For the evaluation of GPUs experiments, we use address trace data to which we apply the pre-processing algorithm [5] for the proposed extended memory and GPUs.…”

Section: (3) Accessing Vectors In Sparse Matrix-vector Multiplicationmentioning

confidence: 99%

“…The address trace data for the multiplication is generated with the folding method [5], which is a pre-process for sparse matricesvector multiplication. In this experiment, we do not optimize the folding length.…”

Section: Sparse Matrices-vector Multiplication Access Bandwidthmentioning

confidence: 99%

“…In this experiment, we do not optimize the folding length. We also use the zero padding method [5] to fair the sparse matrices and the index file contains blocks of zero padding. However, the address trace file does not contain the zero padding accesses because registers or cache of the memory controller is available to access "zero" values without any memory references.…”

Section: Sparse Matrices-vector Multiplication Access Bandwidthmentioning

confidence: 99%

See 3 more Smart Citations

A memory accelerator with gather functions for bandwidth-bound irregular applications

Tanabe

Nuttapon

Nakajo

et al. 2011

Proceedings of the 1st Workshop on Irregular Applications: Architectures and Algorithms

Self Cite

View full text Add to dashboard Cite

Compute intensive processing can be easily accelerated using processors with many cores such as GPUs. However, memory bandwidth limitation becomes serious year by year for memory bandwidth intensive applications such as sparse matrix vector multiplications (SpMV). In order to accelerate memory bandwidth intensive applications, we have proposed a memory system with additional functions of scattering and gathering. For the preliminary evaluation of our proposed system, we assumed that the throughput of the memory system was sufficient. In this paper, we propose a memory system with scattering and gathering using many narrow memory channels. We evaluate the feasible throughput of the proposed memory system based on DDR3 DRAM with the modified DRAMsim2 simulator. In addition, we evaluate the performance of SpMV using our method for the proposed memory system and a GPU. We have confirmed the proposed memory system has good performance and good stability for matrix shape variation using fewer pins for external memory.

show abstract

Section: Sparse Matrices-vector Multiplication Access Bandwidthmentioning

confidence: 56%

“…For the evaluation of GPUs experiments, we use address trace data to which we apply the pre-processing algorithm [5] for the proposed extended memory and GPUs.…”

Section: (3) Accessing Vectors In Sparse Matrix-vector Multiplicationmentioning

confidence: 99%

Section: Sparse Matrices-vector Multiplication Access Bandwidthmentioning

confidence: 99%

Section: Sparse Matrices-vector Multiplication Access Bandwidthmentioning

confidence: 99%

See 2 more Smart Citations

A memory accelerator with gather functions for bandwidth-bound irregular applications

Tanabe

Nuttapon

Nakajo

et al. 2011

Proceedings of the 1st Workshop on Irregular Applications: Architectures and Algorithms

Self Cite

View full text Add to dashboard Cite

show abstract

“…This is more logical, as the data needed to solve the system of equation is generated and already resides in the GPU memory, and no transfer cost is to be needed. of SpMV kernels on graphics hardware has been the subject of many recent researches [23][24][25][26][27][28][29][30][31][32][33]. It has been shown that the naïve implementation of SpMV kernel is quite ineffective on such platforms [23].…”

Section: Solution Of Implicit Pressure Equationmentioning

confidence: 99%

A portable OpenCL-based unstructured edge-based finite element Navier–Stokes solver on graphics hardware

2013

View full text Add to dashboard Cite

The rise of GPUs in modern high-performance systems increases the interest in porting portion of codes to such hardware. The current paper aims to explore the performance of a portable state-of-the-art FE solver on GPU accelerators. Performance evaluation is done by comparing with an existing highly-optimized OpenMP version of the solver.Code portability is ensured by writing the program using the OpenCL 1.1 specifications, while performance portability is sought through an optimization step performed at the beginning of the calculations to find out the optimal parameter set for the solver.The results show that the new implementation can be several times faster than the OpenMP version.

show abstract

Character of Graph Analysis Workloads and Recommended Solutions on Future Parallel Systems

Tanabe

Tomimori

Takata

et al. 2013

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Scaleable Sparse Matrix-Vector Multiplication with Functional Memory and GPUs

Cited by 4 publications

References 11 publications

A memory accelerator with gather functions for bandwidth-bound irregular applications

A memory accelerator with gather functions for bandwidth-bound irregular applications

A portable OpenCL-based unstructured edge-based finite element Navier–Stokes solver on graphics hardware

Character of Graph Analysis Workloads and Recommended Solutions on Future Parallel Systems

Contact Info

Product

Resources

About