John Eisenlohr scite author profile

Abstract-Parallelization and locality optimization of affine loop nests has been successfully addressed for shared-memory machines. However, many large-scale simulation applications must be executed in a distributed-memory environment, and use irregular/sparse computations where the control-flow and arrayaccess patterns are data-dependent.In this paper, we propose an approach for effective parallel execution of a class of irregular loop computations in a distributedmemory environment, using a combination of static and runtime analysis. We discuss algorithms that analyze sequential code to generate an inspector and an executor. The inspector captures the data-dependent behavior of the computation in parallel and without requiring complete replication of any of the data structures used in the original computation. The executor performs the computation in parallel. The effectiveness of the framework is demonstrated on several benchmarks and a climate modeling application.

show abstract

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

Ashari

Sedaghati

Eisenlohr

et al. 2014

View full text Add to dashboard Cite

Sparse matrix-vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and non-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs.In this paper we present a new blocked row-column (BRC) storage format with a novel two-dimensional blocking mechanism that effectively addresses the challenges: it reduces thread divergence by reordering and grouping rows of the input matrix with nearly equal number of non-zero elements onto the same execution units (i.e., warps). BRC improves load balance by partitioning rows into blocks with a constant number of non-zeros such that different warps perform the same amount of work. We also present an efficient autotuning technique to optimize BRC performance by judicious selection of block size based on sparsity characteristics of the matrix. A CUDA implementation of BRC outperforms NVIDIA CUSP and cuSPARSE libraries and other stateof-the-art SpMV formats on a range of unstructured sparse matrices from multiple application domains. The BRC format has been integrated with PETSc, enabling its use in PETSc's solvers.

show abstract

Automatic parallelization of a class of irregular loops for distributed memory systems

Ravishankar

Eisenlohr

Pouchet

et al. 2014

ACM Trans. Parallel Comput.

View full text Add to dashboard Cite

Many scientific applications spend significant time within loops that are parallel, except for dependences from associative reduction operations. However these loops often contain data-dependent control-flow and array-access patterns. Traditional optimizations that rely on purely static analysis fail to generate parallel code in such cases. This article proposes an approach for automatic parallelization for distributed memory environments, using both static and runtime analysis. We formalize the computations that are targeted by this approach and develop algorithms to detect such computations. We also describe algorithms to generate a parallel inspector that performs a runtime analysis of control-flow and array-access patterns, and a parallel executor to take advantage of this information. The effectiveness of the approach is demonstrated on several benchmarks that were automatically transformed using a prototype compiler. For these, the inspector overheads and performance of the executor code were measured. The benefit on real-world applications was also demonstrated through similar manual transformations of an atmospheric modeling software.

show abstract

A model-driven blocking strategy for load balanced sparse matrix–vector multiplication on GPUs

Ashari

Sedaghati

Eisenlohr

et al. 2015

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Eisenlohr

Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications

Code generation for parallel execution of a class of irregular loops on distributed memory systems

An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

Automatic parallelization of a class of irregular loops for distributed memory systems

A model-driven blocking strategy for load balanced sparse matrix–vector multiplication on GPUs

Contact Info

Product

Resources

About