2020
DOI: 10.1098/rspa.2020.0110
|View full text |Cite
|
Sign up to set email alerts
|

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

Abstract: Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
9

Relationship

4
5

Authors

Journals

citations
Cited by 46 publications
(32 citation statements)
references
References 38 publications
0
32
0
Order By: Relevance
“…The work illustrates that mixed-precision techniques can be of great interest for linear solvers in many engineering areas. The results show that on single NVIDIA V100 GPU, the new solvers can be up to 4× faster than an optimized double-precision solver (Haidar et al, 2017, 2018a, 2018b, 2020).…”
Section: Dense Linear Algebramentioning
confidence: 99%
“…The work illustrates that mixed-precision techniques can be of great interest for linear solvers in many engineering areas. The results show that on single NVIDIA V100 GPU, the new solvers can be up to 4× faster than an optimized double-precision solver (Haidar et al, 2017, 2018a, 2018b, 2020).…”
Section: Dense Linear Algebramentioning
confidence: 99%
“…The answers to these questions are of wide interest because these accelerators, despite being introduced to accelerate the training of deep neural networks (NVIDIA, 2017, p. 12), are increasingly being used in general-purpose scientific computing, where their fast low precision arithmetic can be exploited in mixed-precision algorithms (Abdelfattah et al, 2020), for example in iterative refinement for linear systems (Haidar et al, 2018a(Haidar et al, , 2018b(Haidar et al, , 2020.…”
Section: Year Of Releasementioning
confidence: 99%
“…A decade after the two-precision iterative refinement work by Buttari et al, Carson and Higham introduced a GMRES-based iterative refinement algorithm that uses up to three precisions for the solution of linear systems ( Carson & Higham, 2017 ; Carson & Higham, 2018 ). This algorithm enabled Haidar et al ( Haidar et al., 2018a ; Haidar et al., 2020 ; Haidar et al., 2018b ) to successfully exploit the half-precision floating-point arithmetic units of NVIDIA tensor cores in the solution of linear systems. Compared with linear solvers using exclusively double precision, their implementation shows up to a 4×–5× speedup while still delivering double precision accuracy ( Haidar et al., 2020 ; Haidar et al., 2018b ).…”
Section: Introductionmentioning
confidence: 99%
“…This algorithm enabled Haidar et al ( Haidar et al., 2018a ; Haidar et al., 2020 ; Haidar et al., 2018b ) to successfully exploit the half-precision floating-point arithmetic units of NVIDIA tensor cores in the solution of linear systems. Compared with linear solvers using exclusively double precision, their implementation shows up to a 4×–5× speedup while still delivering double precision accuracy ( Haidar et al., 2020 ; Haidar et al., 2018b ). This algorithm is now implemented in the MAGMA library ( Agullo et al, 2009 ; Magma, 2021 ) (routine ) and in , the NVIDIA library that provides LAPACK-like routines (routine ).…”
Section: Introductionmentioning
confidence: 99%