Optimizing Sparse Linear Algebra for Large-Scale Graph Analytics

Buono, Daniele; Gunnels, John A.; Que, Xinyu; Checconi, Fabio; Petrini, Fabrizio; Tuan, Tai-Ching; Long, Chris

doi:10.1109/mc.2015.228

Cited by 14 publications

(3 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The performance of the power method underlying PageRank is strongly determined by that of SPMV. Optimizing this particular computational kernel is challenging, especially for irregular large-scale problems such as those representing hyperlinked graphs for the Web [9].…”

Section: Related Workmentioning

confidence: 99%

High-Performance GPU Implementation of PageRank with Reduced Precision Based on Mantissa Segmentation

Grützmacher

Anzt

Scheidegger

et al. 2018

2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3)

View full text Add to dashboard Cite

We address the acceleration of the PageRank algorithm for web information retrieval on graphics processing units (GPUs) via a modular precision framework that adapts the data format in memory to the numerical requirements as the iteration converges. In detail, we abandon the IEEE 754 singleand double-precision number representation formats, employed in the standard implementation of PageRank, to instead store the data in memory in some specialized formats. Furthermore, we avoid the data duplication by leveraging a data layout based on mantissa segmentation. Our evaluation on a V100 graphics card from NVIDIA shows acceleration factors of up to 30% with respect to the standard algorithm operating in double-precision.

show abstract

Section: Related Workmentioning

confidence: 99%

High-Performance GPU Implementation of PageRank with Reduced Precision Based on Mantissa Segmentation

Grützmacher

Anzt

Scheidegger

et al. 2018

2018 IEEE/ACM 8th Workshop on Irregular Applications: Architectures and Algorithms (IA3)

View full text Add to dashboard Cite

show abstract

“…For both algorithms we provide a compact data representations that allows efficient and cache-friendly execution, offering predictable locality and performance. We presented an early version of the blocked algorithm in [8].…”

Section: Contributionsmentioning

confidence: 99%

“…Recently, many research efforts have tried to provide powerful ways to extract structural properties from graphs. For example, recent studies show how several graph algorithms can be recast as a sequence of linear algebraic operations, such as generalized sparse matrix-matrix multiplication (SpGEMM) and sparse matrix-vector multiplication (SpMV) [7,8]. This approach is becoming more and more prominent [7], because the capability of using linear algebra can greatly simplify data analysis.…”

Section: Introductionmentioning

confidence: 99%

Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics

Buono

Petrini

Checconi

et al. 2016

Proceedings of the 2016 International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

Sparse Matrix-Vector multiplication (SpMV) is a fundamental kernel, used by a large class of numerical algorithms. Emerging big-data and machine learning applications are propelling a renewed interest in SpMV algorithms that can tackle massive amount of unstructured data-rapidly approaching the TeraByte range-with predictable, high performance. In this paper we describe a new methodology to design SpMV algorithms for shared memory multiprocessors (SMPs) that organizes the original SpMV algorithm into two distinct phases. In the first phase we build a scaled matrix, that is reduced in the second phase, providing numerous opportunities to exploit memory locality. Using this methodology, we have designed two algorithms. Our experiments on irregular big-data matrices (an order of magnitude larger than the current state of the art) show a quasi-optimal scaling on a large-scale POWER8 SMP system, with an average performance speedup of 3.8×, when compared to an equally optimized version of the CSR algorithm. In terms of absolute performance, with our implementation, the POWER8 SMP system is comparable to a 256-node cluster. In terms of size, it can process matrices with up to 68 billion edges, an order of magnitude larger than state-of-the-art clusters. CCS Concepts•Computing methodologies → Linear algebra algorithms; Shared memory algorithms; Vector / streaming algorithms; •Mathematics of computing → Graph algorithms; •Theory of computation → Graph algorithms analysis; Data structures design and anal- * Fabrizio Petrini has since changed his affiliation. His current contact is fabrizio.petrini@intel.com ACM acknowledges that this contribution was authored or co-authored by an employee, or contractor of the national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. Permission to make digital or hard copies for personal or classroom use is granted. Copies must bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. To copy otherwise, distribute, republish, or post, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.

show abstract