2021
DOI: 10.1109/access.2021.3086669
|View full text |Cite
|
Sign up to set email alerts
|

Analysis of Posit and Bfloat Arithmetic of Real Numbers for Machine Learning

Abstract: Modern computational tasks are often required to not only guarantee predefined accuracy, but get the result fast. Optimizing calculations using floating point numbers, as opposed to integers, is a nontrivial task. For this reason, there is a need to explore new ways to improve such operations. This paper presents analysis and comparison of various floating point formatsfloat, posit and bfloat. One of the promising areas in which the problem of using such values can be considered to be the most acute is neural … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…The advantage of representing the numerical value O in the bfloat16 format is, that it keeps one sign bit s(O) and the 8-bit exponent e(O) equal to the IEEE 754 single-precision floatingpoint format but shortens the mantissa m(O) to 7 bits. Thus, it enables using tiny numerical values, important in the neural network learning phase [18] for example. While the multiplier determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic multiplier to compute the mantissa.…”
Section: The Design Of the Bfilm Multipliermentioning
confidence: 99%
“…The advantage of representing the numerical value O in the bfloat16 format is, that it keeps one sign bit s(O) and the 8-bit exponent e(O) equal to the IEEE 754 single-precision floatingpoint format but shortens the mantissa m(O) to 7 bits. Thus, it enables using tiny numerical values, important in the neural network learning phase [18] for example. While the multiplier determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic multiplier to compute the mantissa.…”
Section: The Design Of the Bfilm Multipliermentioning
confidence: 99%
“…The advantage of representing the numerical value O in the bfloat16 format is, that it keeps one sign bit s(O) and the 8-bit exponent e(O) equal to the IEEE 754 single-precision floatingpoint format but shortens the mantissa m(O) to 7 bits. Thus, it enables using tiny numerical values, important in the neural network learning phase [18] for example. While the multiplier determines the sign and the exponent exactly, it follows the idea of the approximate iterative logarithmic multiplier to compute the mantissa.…”
Section: The Design Of the Bfilm Multipliermentioning
confidence: 99%