2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) 2021
DOI: 10.1109/isca52012.2021.00021
|View full text |Cite
|
Sign up to set email alerts
|

RaPiD: AI Accelerator for Ultra-low Precision Training and Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 54 publications
(15 citation statements)
references
References 51 publications
0
11
0
Order By: Relevance
“…Compared to [10], [31], FlexBlock allows more fine-grained blocking of sub-tensors to support variable precisions for accelerating the training process as discussed in Section VII-B. In the very recent work on low-precision training [1], [47], [58], [65], [67], 8-bit floating point (FP8 or HFP8) has been used to train DNNs with a little accuracy loss on a wide spectrum of benchmarks. However, the hardware associated with FP8 training uses specific mantissa and exponent bits for its maximum energy efficiency, which lacks flexibility.…”
Section: B Reduced Precision During Dnn Trainingmentioning
confidence: 99%
“…Compared to [10], [31], FlexBlock allows more fine-grained blocking of sub-tensors to support variable precisions for accelerating the training process as discussed in Section VII-B. In the very recent work on low-precision training [1], [47], [58], [65], [67], 8-bit floating point (FP8 or HFP8) has been used to train DNNs with a little accuracy loss on a wide spectrum of benchmarks. However, the hardware associated with FP8 training uses specific mantissa and exponent bits for its maximum energy efficiency, which lacks flexibility.…”
Section: B Reduced Precision During Dnn Trainingmentioning
confidence: 99%
“…High-Performance Application Specific Accelerators. The methodology proposed in this work through BiSon-e features even more flexibility than high-performance application-specific accelerators like [14,48]. For example, [14], represents a state-of-the-art DNN accelerator for mobile devices, featuring 192 processing elements and line buffers for a total area of 36mm2 on the TSMC 65nm technology node.…”
Section: Related Workmentioning
confidence: 99%
“…A spatial compute array is the key component in many popular low-cost CNN accelerators [50,58,97,[113][114][115][116][117][118][119][120][121][122][123].…”
Section: Spatial Architectures For Cnn Inferencementioning
confidence: 99%