In this paper we tackle the inversion of large-scale dense matrices via conventional matrix factorizations (LU, Cholesky, LDL T ) and the Gauss-Jordan method on hybrid platforms consisting of a multi-core CPU and a many-core graphics processor (GPU). Specifically, we introduce the different matrix inversion algorithms using a unified framework based on the notation from the FLAME project; we develop hybrid implementations for those matrix operations underlying the algorithms, alternative to those in existing libraries for single-GPU systems; and we perform an extensive experimental study on a platform equipped with state-of-the-art general-purpose architectures from Intel and a "Fermi" GPU from NVIDIA that exposes the efficiency of the different inversion approaches. Our study and experimental results show the simplicity and performance advantage of the GJE-based inversion methods, and the difficulties associated with the symmetric indefinite case.
The solution of linear systems is a recurrent operation in scientific and engineering applications, traditionally addressed via the LU factorization. The Gauss-Huard (GH) algorithm has been introduced as an efficient alternative in modern platforms equipped with accelerators, although this approach presented some functional constraints. In particular, it was not possible to reuse part of the computations in the solution of delayed linear systems or in the inversion of the matrix. Here, we adapt GH to overcome these two deficiencies of GH, yielding new algorithms that exhibit the same computational cost as their corresponding counterparts based on the LU factorization of the matrix. We evaluate the novel GH extensions on the solution of Lyapunov matrix equations via the LRCF-ADI method, validating our approach via experiments with three benchmarks from model order reduction. Figure 2. Blocked Gauss-Huard (GH) for the solution of Ax D b. On entry, O A D OEA; b, and upon completion, the last column of O A is overwritten with the solution x.Figure 3. Unblocked algorithm for the reutilization of Gauss-Huard (GH) in the solution of Ax D b. On entry, N A is the matrix resulting from the application of the GH algorithm to A, and upon completion, b is overwritten with the solution x.A related problem appears when A has been employed to solve a linear system via GH, and therefore, its contents have been overwritten with the factorization, and the inverse of this matrix is required next. Under special circumstances, this scenario can be of interest, for example, to avoid explicit multiplication with the matrix inverse in the solution of Lyapunov equations via the matrix sign function [10].
The solution of linear systems of equations with many right hand sides is mostly seen as a trivial extension of solving a linear system and the algorithmic developments mostly focus on the efficient computation of the LU decomposition. This is, however, not regarding the case where many right hand sides increase the runtime influence of the forward/backward substitution. In this contribution we present a GPU accelerated Gauss-Jordan-elimination based all-at-once solution scheme which focuses on minimizing the runtime and the energy consumption by switching the forward/backward substitution in favor of a more suitable operation. We obtain a multi-GPU aware algorithm which is up to 2.5 times faster than the current state-of-the-art LU decomposition based solution process of MAGMA and saves 48 % required energy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.