2021 IEEE International Solid- State Circuits Conference (ISSCC) 2021
DOI: 10.1109/isscc42613.2021.9365788
|View full text |Cite
|
Sign up to set email alerts
|

15.1 A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
66
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 113 publications
(66 citation statements)
references
References 5 publications
0
66
0
Order By: Relevance
“…At an operation frequency of 1 GHz and a supply voltage of 0.8 V, the HERMES core shows a peak throughput of 1.008 TOPS at an efficiency of 10.5 TOPS/W, when running the MNIST-based experiment as described above. Compared to the (shown in Table I), the measured throughput density of 1.59 TOPS/mm 2 is significantly higher than recent non-volatile ReRAM-based designs [24], [45] and also slightly higher than recent SRAM + capacitor-based designs [44], when 8-bit input quantization is used. Only the SRAM-based design in [46] shows a better throughput density, given by its compact 8T SRAM unit-cell design employing push-rules and the advanced manufacturing node.…”
Section: System-level Performancementioning
confidence: 69%
See 1 more Smart Citation
“…At an operation frequency of 1 GHz and a supply voltage of 0.8 V, the HERMES core shows a peak throughput of 1.008 TOPS at an efficiency of 10.5 TOPS/W, when running the MNIST-based experiment as described above. Compared to the (shown in Table I), the measured throughput density of 1.59 TOPS/mm 2 is significantly higher than recent non-volatile ReRAM-based designs [24], [45] and also slightly higher than recent SRAM + capacitor-based designs [44], when 8-bit input quantization is used. Only the SRAM-based design in [46] shows a better throughput density, given by its compact 8T SRAM unit-cell design employing push-rules and the advanced manufacturing node.…”
Section: System-level Performancementioning
confidence: 69%
“…By selecting the appropriate D flip-flop that receives A, the increment size of the counter can be made variable. This allows the execution of shift-and-add operations within the ADC at a minimal overhead, avoiding dedicated multi-bit adders [42]- [44]. Hence, we enabled bit-serial input modulation in addition to the conventional multi-bit pulsewidth modulation (PWM).…”
Section: Counter With Variable Increment Sizementioning
confidence: 99%
“…The accumulated current (in the case of pulse amplitude-based encoding) or the accumulated charge (in the case of pulse-width encoding) across each bitline is indicative of the dot product between the input vector and one column of the matrix. Owing to the reduction in matrix data transfers, parallelism and analog mode of operation, in-memory computing allows to perform MVMs at high energy efficiency (>10 TOPS/W) [3,4]. Thus, in-memory computing can be a viable alternative to conventional GPU-based and other digital solutions for MVM acceleration, which is critical for many applications including deep learning.…”
Section: Introductionmentioning
confidence: 99%
“…I N-MEMORY computing (IMC) circuits reduce data transfer by performing computations inside the memory without transferring data to processing units [1]- [6]. IMC circuits for accelerating deep neural networks (DNNs) typically implement crucial operations such as Multiply-Accumulate/Average (MAC/MAV) and activation functions (e.g., Rectified Linear Unit, ReLU).…”
Section: Introductionmentioning
confidence: 99%