2021 IEEE International Solid- State Circuits Conference (ISSCC) 2021
DOI: 10.1109/isscc42613.2021.9365791
|View full text |Cite
|
Sign up to set email alerts
|

9.1 A 7nm 4-Core AI Chip with 25.6TFLOPS Hybrid FP8 Training, 102.4TOPS INT4 Inference and Workload-Aware Throttling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 61 publications
(10 citation statements)
references
References 6 publications
0
10
0
Order By: Relevance
“…The computation model of a neuron is illustrated in Fig. 2 and given by (1). Each neuron performs a weighted sum of all its inputs and then a bias term is added for a possible offset [12].…”
Section: Brief Background On Deep Neural Networkmentioning
confidence: 99%
See 2 more Smart Citations
“…The computation model of a neuron is illustrated in Fig. 2 and given by (1). Each neuron performs a weighted sum of all its inputs and then a bias term is added for a possible offset [12].…”
Section: Brief Background On Deep Neural Networkmentioning
confidence: 99%
“…In [114] fundamental bit decomposition architectures (vertical and horizontal decomposition) are further approximated by constraining the maximum value of the partial sums. IBM RAPID [30] uses DLFloat16, 2-bit (INT2) and 1-bit fxp while [1] supports DLFloat16 and HFP8 formats as well as INT4 and INT2 formats for highly scaled inference. Intel Spring Hill [129] supports FP16 as well as INT8, INT4, INT2, and even 1 bit precision operations natively.…”
Section: Precision Scalingmentioning
confidence: 99%
See 1 more Smart Citation
“…We consider a system with a CPU attached to a co-processor capable of executing common deep learning operations such as GEMM, LSTM activation etc. at various data precisions [30]. The co-processor executes the compute-heavy data-parallel portions of the workload, while control-heavy operations (e.g., sorting and pruning hypothesis) are mapped to the CPU.…”
Section: Inference Performance In End-to-end Modelsmentioning
confidence: 99%
“…15 In early 2021, IBM announced the results of its work on the world's first energy-efficient AI chip with low-precision training and inference, constructed with 7-nm technology. 16 The chip integrates power management with improved model performance and power use. Codesign approaches integrating special AI models and hardware to achieve energy efficiency are also noteworthy.…”
Section: Artificial Intelligence/machine Learningmentioning
confidence: 99%