Summary
Sparse matrix‐vector multiplication (SpMV) is a crucial operation used for solving many engineering and scientific problems. In general, there is no single SpMV method that gives high performance for all sparse matrices. Even though there exist sparse matrix storage formats and SpMV implementations that yield high efficiency for certain matrix structures, using these methods may entail high preprocessing or format conversion costs. In this work, we present a new SpMV implementation, named CSRLenGoto, that can be utilized by preprocessing the Compressed Sparse Row (CSR) format of a matrix. This preprocessing phase is inexpensive enough for the associated cost to be compensated in just a few repetitions of the SpMV operation. CSRLenGoto is based on complete loop unrolling and gives performance improvements in particular for matrices whose mean row length is low. We parallelized our method by integrating it into a state‐of‐the‐art matrix partitioning approach as the kernel operation. We observed up to 2.46× and on the average 1.29× speedup with respect to Intel MKL's SpMV function for matrices with short‐ or medium‐length rows.