A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures

Karakasis, Vasileios; Goumas, Georgios; Koziris, Nectarios

doi:10.1109/cse.2009.223

Cited by 28 publications

(18 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Table 1 lists sparse matrices used in our performance evaluation. These are all the matrices used in previous papers [26,21,9] that are larger than the 30 MB aggregate L2 cache of KNC (using 60 cores). A dense matrix stored in sparse format is also included.…”

Section: Understanding the Performance Of Spmv On Kncmentioning

confidence: 98%

Efficient sparse matrix-vector multiplication on x86-based many-core processors

Liu

Smelyanskiy

Chow

et al. 2013

Proceedings of the 27th International ACM Conference on International Conference on Supercomputing

221

126

View full text Add to dashboard Cite

Sparse matrix-vector multiplication (SpMV) is an important kernel in many scientific applications and is known to be memory bandwidth limited. On modern processors with wide SIMD and large numbers of cores, we identify and address several bottlenecks which may limit performance even before memory bandwidth: (a) low SIMD efficiency due to sparsity, (b) overhead due to irregular memory accesses, and (c) load-imbalance due to non-uniform matrix structures.We describe an efficient implementation of SpMV on the Intel R Xeon Phi TM Coprocessor, codenamed Knights Corner (KNC), that addresses the above challenges. Our implementation exploits the salient architectural features of KNC, such as large caches and hardware support for irregular memory accesses. By using a specialized data structure with careful load balancing, we attain performance on average close to 90% of KNC's achievable memory bandwidth on a diverse set of sparse matrices. Furthermore, we demonstrate that our implementation is 3.52x and 1.32x faster, respectively, than the best available implementations on dual Intel R Xeon R Processor E5-2680 and the NVIDIA Tesla K20X architecture.

show abstract

Section: Understanding the Performance Of Spmv On Kncmentioning

confidence: 98%

Efficient sparse matrix-vector multiplication on x86-based many-core processors

Liu

Smelyanskiy

Chow

et al. 2013

Proceedings of the 27th International ACM Conference on International Conference on Supercomputing

221

126

View full text Add to dashboard Cite

show abstract

“…In this case, we can state that there are three nonzeros of the matrix in the positions (OE0, 1, 23), (2,12), and (2,19). In this case, we can state that there are three nonzeros of the matrix in the positions (OE0, 1, 23), (2,12), and (2,19).…”

Section: Performance Evaluation Using Hardware Counters For Samplingmentioning

confidence: 96%

“…In [10], a performance model for the blocked SpMV, which allows to pick in nearly all cases the actual optimal block size, was presented. In a recent work [12], a comparative study of different blocking storage techniques for sparse matrices on several multicore platforms was performed. Vuduc et al [11] extended the notion of blocking in order to exploit variable block shapes by decomposing the original matrix to a proper sum of submatrices, storing each submatrix in a variation of the blocked CSR format.…”

Section: Related Workmentioning

confidence: 99%

“…In this way, EARs provide a list of the sampled accessed elements. For example, considering seven sampled events, the result of reading the counters could be as follows: y[0], x [23], y [1], y [2], x [12], x [19], and y [2]. As we have indicated earlier, accesses to x give us information about the exact column of the corresponding nonzero element of the matrix, whereas accesses to y provide information about the rows where the nonzero element can be placed.…”

Section: Performance Evaluation Using Hardware Counters For Samplingmentioning

confidence: 99%

See 1 more Smart Citation

Using sampled information: is it enough for the sparse matrix–vector product locality optimization?

Pichel

Lorenzo

Rivera

et al. 2012

Concurrency and Computation

View full text Add to dashboard Cite

One of the main factors that affect the performance of the sparse matrix-vector product (SpMV) is the low data reuse caused by the irregular and indirect memory access patterns. Different strategies to deal with this problem such as data reordering techniques have been proposed. The computational cost of these techniques is typically high because they consider all the nonzeros of the sparse matrix in order to find an appropriate permutation of rows and columns that improves the SpMV performance. In this paper, we analyze the possibility of increasing the locality of the SpMV using incomplete information in the reordering process. This partial information comes as a consequence of considering only a subset of the nonzero elements of the matrix. These nonzeros are obtained from the original matrix through a sampling process. In particular, two different sampling methods have been considered: a random sampling and an event-based sampling using hardware counters. We have detected that a small number of samples is enough to obtain quality reorderings. As a consequence, using sampling-based reorderings leads to noticeable performance improvements with respect to the non-reordered matrices, reaching speedup values up to 2.1 . In addition, an important reduction in the computational time required by the reordering technique has been observed. A, Eijkhout V, Langou J, Filippone S. Performance optimization and modeling of blocked sparse kernels.International Journal of High Performance Computing Applications 2007; 21(4):467-484. 11. Vuduc R, Moon H. Fast sparse matrix-vector multiplication by exploiting variable block structure. In Proceed-

show abstract

“…But, in [10] we opt for balancing the number of columns per block. While in [4,5,9] authors try to balance the number of nonzero elements. The disadvantage of this type of decomposition is that it does not consider the phenomenon of "over loop ".…”

Section: Related Workmentioning

confidence: 99%

SMVP Distribution Using Hypergraph Model and S-GBNZ Algorithm

Mehrez¹,

Hamdi-Larbi²

2013

2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing

View full text Add to dashboard Cite

Sparse Matrix Vector Product (SMVP) is an important kernel in many scientific applications. Since the most common issues in parallel computing are communication and load balancing, our goal is to find a compromise to satisfy these two criteria. Thus, for distributing this kernel on a homogeneous multicore node cluster, we study a solution where we combine two different approaches: hypergraph model that reduces communication cost and S-GBNZ algorithm that ensures load balancing. Our theoretical contribution is validated through experimentations achieved on a multicore cluster within Grid5000.

show abstract

A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures

Cited by 28 publications

References 9 publications

Efficient sparse matrix-vector multiplication on x86-based many-core processors

Efficient sparse matrix-vector multiplication on x86-based many-core processors

Using sampled information: is it enough for the sparse matrix–vector product locality optimization?

SMVP Distribution Using Hypergraph Model and S-GBNZ Algorithm

Contact Info

Product

Resources

About