A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Agarwal,; Gustavson,; Zubair,

doi:10.1109/superc.1992.236712

Cited by 25 publications

(30 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been proposed two strategies to avoid padding in the literature: (a) decompose the original matrix into two or more matrices, where each matrix contains dense subblocks of some common pattern (e.g., rectangular, diagonal blocks, etc. ), while the last matrix contains the remainder elements in a standard sparse storage format [1], and (b) use variablesized blocks [12], [13]. In the following, we will present each blocking method in more detail.…”

Section: An Overview Of Blocking Storage Formatsmentioning

confidence: 99%

“…A version of this format has been initially proposed in [1] as part of a decomposed method, which extracted common dense subblocks from the input matrix. A similar format, called RSDIAG, is also presented in [15], but it maintains an additional structure that stores the total number of diagonals in each segment.…”

Section: A Blocking With Paddingmentioning

confidence: 99%

“…Both formats try to exploit small two-dimensional dense subblocks inside the sparse matrix with their main difference being that BCSR imposes a strict alignment to its blocks at specific row-and column-boundaries. Agarwal et al [1] decompose the input matrix by extracting regular common patterns, such as dense subblocks and partial diagonals. Similarly, Pinar and Heath [12] decompose the original matrix into two submatrices: a matrix with horizontal onedimensional dense subblocks without padding and a matrix in CSR format containing the remainder elements.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Karakasis

Goumas

Koziris

2009

2009 International Conference on Parallel Processing

View full text Add to dashboard Cite

Abstract-Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architecture. The main problem of SpMV is its high demands on memory bandwidth, which cannot yet be abudantly offered from modern commodity architectures. One of the most promising optimization techniques for SpMV is blocking, which can reduce the indexing structures for storing a sparse matrix, and therefore alleviate the pressure to the memory subsystem. However, blocking methods can severely degrade performance if not used properly. In this paper, we study and evaluate a number of representative blocking storage formats and present a performance model that can accurately select the most suitable blocking storage format and the corresponding block shape and size for a specific sparse matrix. Our model considers both the memory and computational part of the kernel, which can be non-negligible when applying blocking, and also assumes an overlapping of memory accesses and computations that modern commodity architectures can offer through hardware prefetching mechanisms.

show abstract

Section: An Overview Of Blocking Storage Formatsmentioning

confidence: 99%

Section: A Blocking With Paddingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Karakasis

Goumas

Koziris

2009

2009 International Conference on Parallel Processing

View full text Add to dashboard Cite

show abstract

“…There exist several methods in the literature proposed to improve the cache locality for the SpMxV operations by reordering the rows and/or columns of the matrix by using graph/hypergraph partitioning [6], [7], [8], [9], [10] and other techniques [11], [12], [13], [14]. The recommendation algorithm used in theadvisor is direction aware.…”

Section: Introductionmentioning

confidence: 99%

Fast Recommendation on Bibliographic Networks

Küçüktunç

Kaya

Saule

et al. 2012

2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

View full text Add to dashboard Cite

Abstract-Graphs and matrices are widely used in algorithms for social network analyses. Since the number of interactions is much less than the possible number of interactions, the graphs and matrices used in the analyses are usually sparse. In this paper, we propose an efficient implementation of a sparsematrix computation which arises in our publicly available citation recommendation service called theadvisor. The recommendation algorithm uses a sparse matrix generated from the citation graph. We observed that the nonzero pattern of this matrix is highly irregular and the computation suffers from high number of cache misses. We propose techniques for storing the matrix in memory efficiently and reducing the number of cache misses. Experimental results show that our techniques are highly efficient on reducing the query processing time which is highly crucial for a web service.

show abstract

“…Performing this operation using the CSR format is trivial, but it was observed that the maximum performance in Mflop/s sustained by a naïve implementation can reach only a small part of the machine peak performance [14]. As a means of transcending this limit, several optimization techniques have been proposed, such as reordering [24,28,29,32], data compression [22,33], blocking [1,15,23,24,28,29,31], vectorization [4,11], loop unrolling [32] and jamming [21], and software prefetching [29]. Lately, the dissemination of multi-core computers have promoted multi-threading as an important tuning technique, which can be further combined with purely sequential methods.…”

Section: Introductionmentioning

confidence: 99%

Parallel Structurally-Symmetric Sparse Matrix-Vector Products on Multi-Core Processors

Batista¹,

Ainsworth²,

Ribeiro³

Proceedings of the Third International Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

View full text Add to dashboard Cite

We consider the problem of developing an efficient multi-threaded implementation of the matrix-vector multiplication algorithm for sparse matrices with structural symmetry. Matrices are stored using the compressed sparse row-column format (CSRC), designed for profiting from the symmetric non-zero pattern observed in global finite element matrices. Unlike classical compressed storage formats, performing the sparse matrix-vector product using the CSRC requires thread-safe access to the destination vector. To avoid race conditions, we have implemented two partitioning strategies. In the first one, each thread allocates an array for storing its contributions, which are later combined in an accumulation step. We analyze how to perform this accumulation in four different ways. The second strategy employs a coloring algorithm for grouping rows that can be concurrently processed by threads. Our results indicate that, although incurring an increase in the working set size, the former approach leads to the best performance improvements for most matrices.

show abstract

A high performance algorithm using pre-processing for the sparse matrix-vector multiplication

Cited by 25 publications

References 6 publications

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels

Fast Recommendation on Bibliographic Networks

Parallel Structurally-Symmetric Sparse Matrix-Vector Products on Multi-Core Processors

Contact Info

Product

Resources

About