A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

Liu, Weifeng; Vinter, Brian

doi:10.1016/j.jpdc.2015.06.010

Cited by 89 publications

(72 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Notable applications using such operations nowadays include, for example, machine learning [14], weather pattern analsis [15], shortest path problem etc. [16]. 2) Various compute intensities correspond to various operations such as: higher for matrix multiplication, lower for addition of vectors etc.…”

Section: B Tests and Resultsmentioning

confidence: 99%

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Czarnul

2018

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

show abstract

Section: B Tests and Resultsmentioning

confidence: 99%

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Czarnul

2018

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

show abstract

“…We use Nvidia K40m (Kepler) and Titan X (Pascal) GPUs for comparing the performance of our algorithm and several existing methods (CUSP [1], cuSPARSE, bhSPARSE [5] and RMerge [3]) that compute C = A 2 in double precision. The CUDA versions are 7.0 and 8.0 on K40m and Titan X, respectively.…”

Section: Performance Evaluation and Conclusionmentioning

confidence: 99%

Register-based implementation of the sparse general matrix-matrix multiplication on GPUs

LiuJunhong

LiuWeifeng

et al. 2018

SIGPLAN Not.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The biggest challenges are (i) that the structure of the resulting matrix depends on the input matrices, (ii) that the organization of the entries in the resulting matrix requires communication between threads, and (iii) that the number of operations carried out by individual threads may vary strongly. To provide an efficient implementation we take advantage of the algorithmic description of bhSparse [LV15]. We tackle the aforementioned issues in a four stage approach: In the first stage, we compute an upper bound for the number for nonzeros in each column of the result matrix, which allows for allocating sufficient storage.…”

Section: Parallel Gpu Implementationmentioning

confidence: 99%

“…Note that M v implies summation on the compressed direction. For reference, we report the timings for cuSparse [NVI15] and Bhsparse[LV15].…”

mentioning

confidence: 99%

A GPU-Adapted Structure for Unstructured Grids

Zayer

Steinberger

Seidel

2017

Computer Graphics Forum

View full text Add to dashboard Cite

A key advantage of working with structured grids (e.g., images) is the ability to directly tap into the powerful machinery of linear algebra. This is not much so for unstructured grids where intermediate bookkeeping data structures stand in the way. On modern high performance computing hardware, the conventional wisdom behind these intermediate structures is further challenged by costly memory access, and more importantly by prohibitive memory resources on environments such as graphics hardware. In this paper, we bypass this problem by introducing a sparse matrix representation for unstructured grids which not only reduces the memory storage requirements but also cuts down on the bulk of data movement from global storage to the compute units. In order to take full advantage of the proposed representation, we augment ordinary matrix multiplication by means of action maps, local maps which encode the desired interaction between grid vertices. In this way, geometric computations and topological modifications translate into concise linear algebra operations. In our algorithmic formulation, we capitalize on the nature of sparse matrix-vector multiplication which allows avoiding explicit transpose computation and storage. Furthermore, we develop an efficient vectorization to the demanding assembly process of standard graph and finite element matrices.

show abstract

A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

Cited by 89 publications

References 49 publications

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Benchmarking overlapping communication and computations with multiple streams for modern GPUs

Register-based implementation of the sparse general matrix-matrix multiplication on GPUs

A GPU-Adapted Structure for Unstructured Grids

Contact Info

Product

Resources

About