2021
DOI: 10.7717/peerj-cs.330
|View full text |Cite
|
Sign up to set email alerts
|

Numerical behavior of NVIDIA tensor cores

Abstract: We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
33
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 28 publications
(33 citation statements)
references
References 13 publications
0
33
0
Order By: Relevance
“…We have also performed experiments using the AMD dual-socket EPYC Naples system with 64 cores and the NVIDIA P100 GPU; and we obtained similar results. We note that the arithmetic properties of the NVIDIA GPUs are investigated in Fasi et al. (2021) .…”
Section: Methodsmentioning
confidence: 99%
“…We have also performed experiments using the AMD dual-socket EPYC Naples system with 64 cores and the NVIDIA P100 GPU; and we obtained similar results. We note that the arithmetic properties of the NVIDIA GPUs are investigated in Fasi et al. (2021) .…”
Section: Methodsmentioning
confidence: 99%
“…The accumulator inside Tensor Cores has at least 2 extra bits of mantissa and RZ is used for rounding [6]. It follows that RZ is performed in the accumulator frag c in every k iteration in Code 2.…”
Section: Avoiding Rz During Tensor Core Accumulationmentioning
confidence: 99%
“…Jia et al and Raihan et al analyze how Tensor Core assembly instructions divide the input matrices, and the order they compute multiplications of the subdivided matrices [14,25]. There have also been studies on how Tensor Cores support subnormal numbers and use RZ (Round toward Zero) [6]. Others have performed error analysis of Tensor Cores, where the theoretical error bound of mixed-precision block FMA computation is analyzed and compared to the actual error of Tensor Cores [1].…”
Section: Introductionmentioning
confidence: 99%
“…In this data, reuse is done to reduce the consumption of energy and memory access. The numerical explanation of the TPU of NVIDA was done and floating point operation was studied and its shortcomings was identified and non-monotonicity issue that concern the floating point was explained ( Fasi et al, 2021 ).…”
Section: Introductionmentioning
confidence: 99%