Analysis of Posit and Bfloat Arithmetic of Real Numbers for Machine Learning

Romanov, A. Yu.; Stempkovsky, A. L.; Lariushkin, Ilia V.; Novoselov, Georgy E.; Solovyev, Roman; Starykh, Vladimir A.; Romanova, I. I.; Telpukhov, Dmitry; Mkrtchan, Ilya A.

doi:10.1109/access.2021.3086669

“…The advantage of representing the numerical value O in the bfloat16 format is, that it keeps one sign bit s(O) and the 8-bit exponent e(O) equal to the IEEE 754 single-precision floatingpoint format but shortens the mantissa m(O) to 7 bits. Thus, it enables using tiny numerical values, important in the neural network learning phase [18] for example. While the multiplier determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic multiplier to compute the mantissa.…”

Section: The Design Of the Bfilm Multipliermentioning

confidence: 99%

Energy-efficient neural network learning with accuracy-adjustable floating-point multiplier

Pilipović¹,

Bulić²,

Lotrič³

2023

Preprint

0

View full text Add to dashboard Cite

<p>This paper proposes a novel approximate bfloat16 multiplier with on-the-fly adjustable accuracy for energy-efficient learning in deep neural networks. The size of the proposed multiplier is only 62% of the size of the exact bfloat16 multiplier. Furthermore, its energy footprint is up to five times smaller than the footprint of the exact bfloat multiplier. We demonstrate the advantages of the proposed multiplier in deep neural network learning, where we successfully train the ResNet-20 network on the CIFAR-10 dataset from scratch. </p>

show abstract

“…The advantage of representing the numerical value O in the bfloat16 format is, that it keeps one sign bit s(O) and the 8-bit exponent e(O) equal to the IEEE 754 single-precision floatingpoint format but shortens the mantissa m(O) to 7 bits. Thus, it enables using tiny numerical values, important in the neural network learning phase [18] for example. While the multiplier determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic multiplier to compute the mantissa.…”

Section: The Design Of the Bfilm Multipliermentioning

confidence: 99%

Energy-efficient neural network learning with accuracy-adjustable floating-point multiplier

Pilipović¹,

Bulić²,

Lotrič³

2023

Preprint

0

View full text Add to dashboard Cite

<p>This paper proposes a novel approximate bfloat16 multiplier with on-the-fly adjustable accuracy for energy-efficient learning in deep neural networks. The size of the proposed multiplier is only 62% of the size of the exact bfloat16 multiplier. Furthermore, its energy footprint is up to five times smaller than the footprint of the exact bfloat multiplier. We demonstrate the advantages of the proposed multiplier in deep neural network learning, where we successfully train the ResNet-20 network on the CIFAR-10 dataset from scratch. </p>

show abstract