Investigating half precision arithmetic to accelerate dense linear system solvers

Abstract. We propose a general algorithm for solving an n\times n nonsingular linear system Ax = b based on iterative refinement with three precisions. The working precision is combined with possibly different precisions for solving for the correction term and for computing the residuals. Via rounding error analysis of the algorithm we derive sufficient conditions for convergence and bounds for the attainable forward error and normwise and componentwise backward errors. Our results generalize and unify many existing rounding error analyses for iterative refinement. With single precision as the working precision, we show that by using LU factorization in IEEE half precision as the solver and calculating the residuals in double precision it is possible to solve Ax = b to full single precision accuracy for \infty -norm condition numbers \kappa \infty (A) \leq 10 4 , with the O(n 3 ) part of the computations carried out entirely in half precision. We show further that by solving the correction equations by GMRES preconditioned by the LU factors the restriction on the condition number can be weakened to \kappa \infty (A) \leq 10 8 , although in general there is no guarantee that GMRES will converge quickly. Taking for comparison a standard Ax = b solver that uses LU factorization in single precision, these results suggest that on architectures for which half precision is efficiently implemented it will be possible to solve certain linear systems Ax = b up to twice as fast and to greater accuracy. Analogous results are given with double precision as the working precision.Key words. iterative refinement, linear system, multiple precision, mixed precision, rounding error analysis, backward error, forward error, GMRES, preconditioning AMS subject classifications. 65G50, 65F10 DOI. 10.1137/17M11408191. Introduction. Iterative refinement is a method for improving an approximate solution y to a linear system Ax = b by computing the residual r = b -Ay, solving the correction equation Ad = r, forming the update y \leftarr y + d, and repeating these steps as necessary. We consider a general iterative refinement algorithm that includes a variety of existing ones as special cases. The algorithm contains three precisions:\bullet u is the precision at which the data A, b and the solution x are stored (the working precision), \bullet u f is the precision at which the factorization of A is computed, \bullet u r is the precision at which residuals are computed. The precisions are assumed to satisfy

show abstract

“…A first step in this direction is the recent performance study of Haidar et al [16], which shows promising results.…”

Section: Discussionmentioning

confidence: 99%

Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions

Carson¹,

Higham

2018

SIAM J. Sci. Comput.

150

162

View full text Add to dashboard Cite

show abstract

“…An investigation of similar iterative refinement methods on earlier generations of GPUs can be found in [11]. With the announcement of NVIDIA's V100 Tensor Cores.…”

Section: Related Workmentioning

confidence: 99%

“…Compared to our previous work in [11], the primary contribution of this paper is to propose and implement a high-performance framework for the mixed-precision iterative refinement solvers that makes use for the first time of GPU Tensor Core-accelerated FP16-TC. To this end, we will:…”

Section: Contributionsmentioning

confidence: 99%

Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers

Haidar

Tomov

Dongarra

et al. 2018

SC18: International Conference for High Performance Computing, Networking, Storage and Analysis

Self Cite

171

118

View full text Add to dashboard Cite

Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing applications, especially those in artificial intelligence. Here, we present an investigation showing that other high-performance computing (HPC) applications can also harness this power. Specifically, we use the general HPC problem, Ax = b, where A is a large dense matrix, and a double precision (FP64) solution is needed for accuracy. Our approach is based on mixed-precision (FP16→FP64) iterative refinement, and we generalize and extend prior advances into a framework, for which we develop architecture-specific algorithms and highly tuned implementations. These new methods show how using half-precision Tensor Cores (FP16-TC) for the arithmetic can provide up to 4× speedup. This is due to the performance boost that the FP16-TC provide as well as to the improved accuracy over the classical FP16 arithmetic that is obtained because the GEMM accumulation occurs in FP32 arithmetic.

show abstract

“…While in many scientific applications the use of double-precision floating-point is most common, this precision is not always required. For example, iterative methods can exhibit resilience against low precision arithmetic as has been shown for the computation of inverse matrix roots [Lass et al 2018a] and for solving systems of linear equations [Angerer et al 2016;Haidar et al 2018Haidar et al , 2017Klavík et al 2014]. Mainly driven by the growing popularity of artificial neural networks [Gupta et al 2015], we can observe growing support of low-precision data types in hardware accelerators.…”

Section: Approximate Computingmentioning

confidence: 99%

Accurate Sampling with Noisy Forces from Approximate Computing

et al. 2020

View full text Add to dashboard Cite

In scientific computing, the acceleration of atomistic computer simulations by means of custom hardware is finding ever growing application. A major limitation, however, is that the high efficiency in terms of performance and low power consumption entails the massive usage of low-precision computing units. Here, based on the approximate computing paradigm, we present an algorithmic method to rigorously compensate for numerical inaccuracies due to low-accuracy arithmetic operations, yet still obtaining exact expectation values using a properly modified Langevin-type equation.

show abstract

Investigating half precision arithmetic to accelerate dense linear system solvers

Cited by 52 publications

References 17 publications

Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions

Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions

Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed up Mixed-Precision Iterative Refinement Solvers

Accurate Sampling with Noisy Forces from Approximate Computing

Contact Info

Product

Resources

About