Better Size Estimation for Sparse Matrix Products

Amossen, Rasmus Resen; Campagna, Andrea; Pagh, Rasmus

doi:10.1007/978-3-642-15369-3_31

Cited by 15 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second method, probabilistic method, estimates an imprecise nnz(C). This group of approaches [28,29,30] are based on random sampling and probability analysis on the input matrices. Since they do not guarantee a safe lower bound for the resulting matrix C and extra memory has to be allocated while the estimation fails, they were mostly used for estimating the shortest execution time of multiplication of multiple sparse matrices.…”

Section: Memory Pre-allocation For the Resulting Matrixmentioning

confidence: 99%

“…Algorithm 3 Pseudocode for the second stage on a CPU core. 1: for each entry u i in U do 2: if u i = 0 then ⊲ The 1st bin group 3: insert i to bin else if u i > 512 then ⊲ The 5th bin group 24: insert i to bin 37 25: nnz( c i * ) ← 256 26: end if 27: end for 28: nnz( C) ← nnz( c i * ) each delete-max step in our variant heapsort, the root node and the first entry of the resulting sequence are fused if they share the same index; otherwise the root node is inserted to the head part of the sequence. Our method is also distinguished from a heap-based sparse accumulator given by Gilbert et al [43] by the mechanism of eliminating duplicate entries.…”

Section: Algorithmmentioning

confidence: 99%

See 1 more Smart Citation

A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

Liu

Vinter

2015

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

h i g h l i g h t s• We design a framework for SpGEMM on modern manycore processors using the CSR format.• We present a hybrid method for pre-allocating the resulting sparse matrix.• We propose an efficient parallel insert method for long rows of the resulting matrix.• We develop a heuristic-based load balancing strategy. • Our approach significantly outperforms other known CPU and GPU SpGEMM methods. a b s t r a c tGeneral sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices.In this work we propose a framework for SpGEMM on GPUs and emerging CPU-GPU heterogeneous processors. This framework particularly focuses on the above three problems. Memory pre-allocation for the resulting matrix is organized by a hybrid method that saves a large amount of global memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary arithmetic operations on the nonzero entries and is guaranteed in all stages.Compared with the state-of-the-art CPU and GPU SpGEMM methods, our approach delivers excellent absolute performance and relative speedups on various benchmarks multiplying matrices with diverse sparsity structures. Furthermore, on heterogeneous processors, our SpGEMM approach achieves higher throughput by using re-allocatable shared virtual memory.

show abstract

Section: Memory Pre-allocation For the Resulting Matrixmentioning

confidence: 99%

Section: Algorithmmentioning

confidence: 99%

A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

Liu

Vinter

2015

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

show abstract

“…In contrast we show that surprisingly, at least to the authors,Õ(n/ϵ) bits of communication is possible with only 2 rounds. In [5], Amossen, Campagna, and Pagh improve the time complexity of [12], provided ϵ is not too small. However, a direct adaptation of this algorithm to the distributed model would result an even higher communication cost of Ω(n 2 ).…”

Section: Related Workmentioning

confidence: 99%

Distributed Statistical Estimation of Matrix Products with Applications

Woodruff

Zhang

2018

Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

View full text Add to dashboard Cite

We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix A and Bob holds a matrix B, and they want to estimate statistics of A · B. We focus on the well-studied ℓ p -norm, distinct elements (p = 0), ℓ 0 -sampling, and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication.This problem is closely related to the fundamental set-intersection join problem in databases: when p = 0 the problem corresponds to the size of the set-intersection join. When p = ∞ the output is simply the pair of sets with the maximum intersection size. When p = 1 the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of sets with intersection size above a certain threshold, and the problem of sampling an intersecting pair of sets uniformly at random.

show abstract

“…It is proportional to the join product of two matrix relations A and B with the condition A.col = B.row. The multiplication then rather turns into a relational join followed by a projection [10] where techniques of join size estimation (e.g., based on hashing [16]) can be applied to estimate the cost of the sparse algorithm.…”

Section: Architecture and Requirementsmentioning

confidence: 99%

Bringing Linear Algebra Objects to Life in a Column-Oriented In-Memory Database

Kernert

Köhler

Lehner

2015

In Memory Data Management and Analysis

View full text Add to dashboard Cite

Abstract. Large numeric matrices and multidimensional data arrays appear in many science domains, as well as in applications of financial and business warehousing. Common applications include eigenvalue determination of large matrices, which decompose into a set of linear algebra operations. With the rise of in-memory databases it is now feasible to execute these complex analytical queries directly in the database without being restricted by hard disc latencies for random accesses. In this paper, we present a way to integrate linear algebra operations and large matrices as first class citizens into an in-memory database following a two-layered architectural model. The architecture consists of a logical component receiving manipulation statements and linear algebra expressions, and of a physical layer, which autonomously administrates multiple matrix storage representations. A cost-based hybrid storage representation is presented and an experimental implementation is evaluated for matrix-vector multiplications.

show abstract

Better Size Estimation for Sparse Matrix Products

Cited by 15 publications

References 16 publications

A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors

Distributed Statistical Estimation of Matrix Products with Applications

Bringing Linear Algebra Objects to Life in a Column-Oriented In-Memory Database

Contact Info

Product

Resources

About