2022
DOI: 10.48550/arxiv.2203.06673
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support

Abstract: Training deep neural networks (DNNs) is a computationally expensive job, which can take weeks or months even with high performance GPUs. As a remedy for this challenge, community has started exploring the use of more efficient data representations in the training process, e.g., block floating point (BFP). However, prior work on BFP-based DNN accelerators rely on a specific BFP representation making them less versatile. This paper builds upon an algorithmic observation that we can accelerate the training by lev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 44 publications
0
2
0
Order By: Relevance
“…The prior work on designing an energy-efficient DNN accelerator mostly focus on Conv/FC operations [26], [27], [34], [36], while there is lack of research on making the BN hardware more efficient. One of the most effective ways of improving hardware efficiency of a processing unit is reducing the bit-precision.…”
Section: A Compute Units For Bn Layersmentioning
confidence: 99%
See 1 more Smart Citation
“…The prior work on designing an energy-efficient DNN accelerator mostly focus on Conv/FC operations [26], [27], [34], [36], while there is lack of research on making the BN hardware more efficient. One of the most effective ways of improving hardware efficiency of a processing unit is reducing the bit-precision.…”
Section: A Compute Units For Bn Layersmentioning
confidence: 99%
“…With the mixed-precision training, multiplications are performed in FP16 while accumulations are performed in FP32. Unconventional data representations suited at DNN training, such as bfloat16 [13], [19] and block floating point (BFP) representation [8], [26], have been studied as well. To enable DNN training at much lower hardware cost, possibly at the edge, researchers have explored low-precision training using FP8 (i.e., 8-bit floating point) with the support of squeeze and shift operations [4] or exponent biases [9] to cover a wide dynamic range of the original data distribution.…”
Section: Introductionmentioning
confidence: 99%