2020 IEEE Symposium on VLSI Circuits 2020
DOI: 10.1109/vlsicircuits18222.2020.9162917
|View full text |Cite
|
Sign up to set email alerts
|

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(6 citation statements)
references
References 0 publications
0
6
0
Order By: Relevance
“…The DNN training may not be the only answer for AI to reach human intelligence, but it will lead to the harmonious coexistence of AI and human beings. [3,4,10] • Mixed-mode Computing [5,8,9] • Stochastic Rounding Circuit [29] • Binary BW Computing MAC [34] • FP-FXP Fused Multiply-Add Unit [16,18,19,21,22,24,26,31] PE/Circuit Level Feature…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…The DNN training may not be the only answer for AI to reach human intelligence, but it will lead to the harmonious coexistence of AI and human beings. [3,4,10] • Mixed-mode Computing [5,8,9] • Stochastic Rounding Circuit [29] • Binary BW Computing MAC [34] • FP-FXP Fused Multiply-Add Unit [16,18,19,21,22,24,26,31] PE/Circuit Level Feature…”
Section: Discussionmentioning
confidence: 99%
“…[29] [26,31,51,78] Sparsity Exploitation [17][18][19] was proposed to unify the data representation method of both input operand and accumulation. Flexpoint [55] tried to substitute FP with FXP representation using a shared exponent management algorithm together for simplification of MAC design, but it failed to reduce the required bit-precision to less than 16-bit.…”
Section: A New Number Representationmentioning
confidence: 99%
See 2 more Smart Citations
“…Each compute array is specialized for certain type of AI operations, allowing for higher circuit customization and density as well as lower latency and power since both compute arrays may not be in use. It exploits a systolic dataflow architecture similar to designs in [4,33,60,65,92]. Discrete synchronization hardware and micro-instructions in the various engines allow for synchronization of the operations within the accelerator and with the general purpose core that initiates the execution of NNPA instructions.…”
Section: Computementioning
confidence: 99%