Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarse-grained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid methods on massively parallel throughput-oriented processors, such as graphics processing units, demands algorithms with abundant fine-grained parallelism. In this paper, we develop a parallel algebraic multigrid method which exposes substantial fine-grained parallelism in both the construction of the multigrid hierarchy as well as the cycling or solve stage. Our algorithms are expressed in terms of scalable parallel primitives that are efficiently implemented on the GPU. The resulting solver achieves an average speedup of 1.8× in the setup phase and 5.7× in the cycling phase when compared to a representative CPU implementation.
Sparse matrix--matrix multiplication (SpGEMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpGEMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpGEMM operation into three highly parallel phases: expansion, sorting, and contraction, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.
Algebraic multigrid methods solve sparse linear systems Ax = b by automatic construction of a multilevel hierarchy. This hierarchy is defined by grid transfer operators that must accurately capture algebraically smooth error relative to the relaxation method. We propose a methodology to improve grid transfers through energy minimization. The proposed strategy is applicable to Hermitian, non-Hermitian, definite, and indefinite problems. Each column of the grid transfer operator P is minimized in an energy-based norm while enforcing two types of constraints: a defined sparsity pattern and preservation of specified modes in the range of P. A Krylov-based strategy is used to minimize energy, which is equivalent to solving AP j = 0 for each column j of P , with the constraints ensuring a nontrivial solution. For the Hermitian positive definite case, a conjugate gradient (CG-)based method is utilized to construct grid transfers, while methods based on generalized minimum residual (GMRES) and CG on the normal equations (CGNR) are explored for the general case. The approach is flexible, allowing for arbitrary coarsenings, unrestricted sparsity patterns, straightforward long-distance interpolation, and general use of constraints, either user-defined or auto-generated. We conclude with numerical evidence in support of the proposed framework.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.