2022 IEEE International Solid- State Circuits Conference (ISSCC) 2022
DOI: 10.1109/isscc42614.2022.9731762
|View full text |Cite
|
Sign up to set email alerts
|

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 63 publications
(14 citation statements)
references
References 2 publications
0
14
0
Order By: Relevance
“…Yet most existing SRAM-based IMC structures have low data density 8 . The data density in [41,42] is lower than 0.3Mb mm −2 . This inevitably causes more off-chip memory communication during training, and can hardly scale out for large-scale training tasks at low cost.…”
Section: Introductionmentioning
confidence: 80%
See 1 more Smart Citation
“…Yet most existing SRAM-based IMC structures have low data density 8 . The data density in [41,42] is lower than 0.3Mb mm −2 . This inevitably causes more off-chip memory communication during training, and can hardly scale out for large-scale training tasks at low cost.…”
Section: Introductionmentioning
confidence: 80%
“…Recently, there are several works employing floating-point formats for IMC designs: Tu et al [41] developed a digital processor handling GeMM between integers and FP numbers, reaching 14 TFLOPS/W efficiency computing GeMM with BFloat16 format. Lee et al [42] designed an SRAM-based IMC circuit processing GeMM between BFloat16, which separates exponent and fraction storage with the best computing efficiency at 1.43 TFLOPS W −1 ; Lee et al [43] give a DRAM-based near-memory computing design with high throughput at 1 TFLOPS per chip.…”
Section: Introductionmentioning
confidence: 99%
“…In general, the fully digital approach is exceptionally robust to various nonidealities, such as device variability, drift, noise, or IR drop, and it can have higher reconfigurability [134]- [136]. However, because of the accumulation through counting, the parallelism of the computation is limited to just one Fig.…”
Section: A Fully Digital Circuitsmentioning
confidence: 99%
“…For example, the block-wise weight sparsity and dynamic activation sparsity is proposed to apply sparsity techniques on the regular CIM structure [14] . The set-associate block-wise sparsity, tensor-train compression and bitwise sparsity are also explored to save execution time or power consumption [15,26] .…”
Section: System-level Cim Chipsmentioning
confidence: 99%