2018
DOI: 10.1137/17m1140819
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions

Abstract: Abstract. We propose a general algorithm for solving an n\times n nonsingular linear system Ax = b based on iterative refinement with three precisions. The working precision is combined with possibly different precisions for solving for the correction term and for computing the residuals. Via rounding error analysis of the algorithm we derive sufficient conditions for convergence and bounds for the attainable forward error and normwise and componentwise backward errors. Our results generalize and unify many ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
167
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 147 publications
(173 citation statements)
references
References 32 publications
5
167
0
1
Order By: Relevance
“…Replacing the direct triangular solves of the correction equation with an iterative method, as suggested in [4] in a mixed precision context, leads to "nesting" of two iterative methods, which in general are called "inner-outer" iterations, the latter having been studied both theoretically and computationally [9], [21], [23], including in mixed-precision computation scenarios [2]. Recently, Carson and Higham [4], [5] analyzed the convergence property of a three precision iterative refinement scheme (factorization precision, working precision, residual precision) and concluded that if the condition number of A is not too large, κ ∞ (A) = A ∞ A −1 ∞ < 10 4 , then using FP16 for the O(n 3 ) portion (the LU factorization) and (FP32, FP64) or (FP64, FP128) as the (working, residual) precision for the O(n 2 ) portion (refinement loop), one can expect to achieve forward error and backward error on the order of 10 −8 and 10 −16 respectively. We note that, ifx is the solution of Ax = b the forward error is defined by x − x ∞ / x ∞ and the backward error is defined by r 2 / A 2 x 2 where r = b−Ax.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Replacing the direct triangular solves of the correction equation with an iterative method, as suggested in [4] in a mixed precision context, leads to "nesting" of two iterative methods, which in general are called "inner-outer" iterations, the latter having been studied both theoretically and computationally [9], [21], [23], including in mixed-precision computation scenarios [2]. Recently, Carson and Higham [4], [5] analyzed the convergence property of a three precision iterative refinement scheme (factorization precision, working precision, residual precision) and concluded that if the condition number of A is not too large, κ ∞ (A) = A ∞ A −1 ∞ < 10 4 , then using FP16 for the O(n 3 ) portion (the LU factorization) and (FP32, FP64) or (FP64, FP128) as the (working, residual) precision for the O(n 2 ) portion (refinement loop), one can expect to achieve forward error and backward error on the order of 10 −8 and 10 −16 respectively. We note that, ifx is the solution of Ax = b the forward error is defined by x − x ∞ / x ∞ and the backward error is defined by r 2 / A 2 x 2 where r = b−Ax.…”
Section: Related Workmentioning
confidence: 99%
“…The convergence tolerance is chosen of the order of the unit roundoff of the low precision arithmetic used during the factorization (e.g., we use 10 −4 or 10 −8 for when the LU is in FP16 or FP32 respectively). Since this paper focuses on practical usage and possible performance gains rather than error analysis, we point the reader to [4], [5] for detailed error analysis of the IR and IRGM techniques.…”
Section: A Backgroundmentioning
confidence: 99%
“…preconditioner memory transfers (11) data transfers (from memory) per iteration, where fpxx i denotes the precision format selected for the ith diagonal block of the preconditioner. The data transfer volume of the block-Jacobi preconditioner thus depends on the format employed to store the block inverse.…”
Section: Energy Modelmentioning
confidence: 99%
“…To avoid the previous two pitfalls, in our final experiment, we compute the total data transfers of a single iteration of the PCG method with the block-Jacobi preconditioner stored in fp64, fp32, fp16, or adaptive precision, see Equation (11). To obtain an estimated total data transfer volume, we then combine the data transfer volume per iteration with the number of iterations needed to reach convergence in each case, ignoring those cases for which half precision does not converge.…”
Section: Energy Modelmentioning
confidence: 99%
See 1 more Smart Citation