Qingpeng Niu scite author profile

Sparse matrix-matrix multiplication (SpGEMM) is an important primitive for many data analytics algorithms, such as Markov clustering. Unlike the dense case, where performance of matrix-matrix multiplication is considerably higher than matrix-vector multiplication, the opposite is true for the sparse case on GPUs. A signi cant challenge is that the sparsity structure of the output sparse matrix is not known a priori, and many additive contributions must be combined to generate its non-zero elements. We use synthetic matrices to characterize the e ectiveness of alternate approaches and devise a hybrid approach that is demonstrated to be consistently superior to other available GPU SpGEMM implementations.

show abstract

A fast implementation of MLR-MCL algorithm on multi-core processors

Niu

Lai

Faisal

et al. 2014

View full text Add to dashboard Cite

Widespread use of stochastic flow based graph clustering algorithms, e.g. Markov Clustering (MCL), has been hampered by their lack of scalability and fragmentation of output. Multi-Level Regularized Markov Clustering (MLR-MCL) is an improvement over Markov Clustering (MCL), providing faster performance and better quality of clusters for large graphs. However, a closer look at MLR-MCL's performance reveals potential for further improvement. In this paper we present a fast parallel implementation of MLR-MCL algorithm via static work partitioning based on analysis of memory footprints. By parallelizing the most time consuming region of the sequential MLR-MCL algorithm, we report up to 10.43x (5.22x in average) speedup on CPU, using 8 datasets from SNAP and 3 PPI datasets. In addition, our algorithm can be adapted to perform general sparse matrix-matrix multiplication (SpGEMM), and our experimental evaluation shows up to 3.50x (1.92x in average) speedup on CPU, and up to 5.12x (2.20x in average) speedup on MIC, comparing to the SpGEMM kernel provided by Intel Math Kernel Library (MKL).

show abstract

Global‐view coefficients: a data management solution for parallel quantum Monte Carlo applications

Niu

Dinan

Tirukkovalur

et al. 2016

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYQuantum Monte Carlo (QMC) applications perform simulation with respect to an initial state of the quantum mechanical system, which is often captured by using a cubic B-spline basis. This representation is stored as a read-only table of coefficients and accesses to the table are generated at random as part of the Monte Carlo simulation. Current QMC applications, such as QWalk and QMCPACK, replicate this table at every process or node, which limits scalability because increasing the number of processors does not enable larger systems to be run. We present a partitioned global address space approach to transparently managing this data using Global Arrays in a manner that allows the memory of multiple nodes to be aggregated. We develop an automated data management system that significantly reduces communication overheads, enabling new capabilities for QMC codes. Experimental results with QWalk and QMCPACK demonstrate the effectiveness of the data management system.

show abstract

A global address space approach to automated data management for parallel Quantum Monte Carlo applications

Niu

Dinan

Tirukkovalur

et al. 2012

View full text Add to dashboard Cite

Quantum Monte Carlo (QMC) applications perform simulation with respect to an initial state of the quantum mechanical system, which is often captured by using a cubic B-spline basis. This representation is stored as a read-only table of coefficients, and accesses to the table are generated at random as part of the Monte Carlo simulation. Current QMC applications such as QWalk and QMCPACK, replicate this table at every process or node, which limits scalability because increasing the number of processors does not enable larger systems to be run. We present a partitioned global address space (PGAS) approach to transparently managing this data using Global Arrays in a manner that allows the memory of multiple nodes to be aggregated. We develop an automated data management system that significantly reduces communication overheads, enabling new capabilities for QMC codes. Experimental results with the QWalk application demonstrate the effectiveness of the data management system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Qingpeng Niu

PARDA: A Fast Parallel Reuse Distance Analysis Algorithm

On improving performance of sparse matrix-matrix multiplication on GPUs

A fast implementation of MLR-MCL algorithm on multi-core processors

Global‐view coefficients: a data management solution for parallel quantum Monte Carlo applications

A global address space approach to automated data management for parallel Quantum Monte Carlo applications

Contact Info

Product

Resources

About