2018 IEEE International Solid - State Circuits Conference - (ISSCC) 2018
DOI: 10.1109/isscc.2018.8310261
|View full text |Cite
|
Sign up to set email alerts
|

QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 79 publications
(43 citation statements)
references
References 3 publications
0
43
0
Order By: Relevance
“…State-of-the-art silicon prototypes such as QUEST [43] or UNPU [44] are exploiting such strong quantization and voltage scaling and have been able to measure such high energy efficiency with their devices. The UNPU reaches an energy efficiency of 50.6 TOp/s/W at a throughput of 184 GOp/s with 1bit weights and 16-bit activations on 16 mm 2 of silicon in 65 nm technology.…”
Section: Fpga and Asic Acceleratorsmentioning
confidence: 99%
“…State-of-the-art silicon prototypes such as QUEST [43] or UNPU [44] are exploiting such strong quantization and voltage scaling and have been able to measure such high energy efficiency with their devices. The UNPU reaches an energy efficiency of 50.6 TOp/s/W at a throughput of 184 GOp/s with 1bit weights and 16-bit activations on 16 mm 2 of silicon in 65 nm technology.…”
Section: Fpga and Asic Acceleratorsmentioning
confidence: 99%
“…al [5] and the QUEST log-quantized 3D-stacked inference engine by Ueyoshi et. al [6]. Indeed, bit-serial operand feeding implicitly allows fully-variable bit precision.…”
Section: Bit-serial Designsmentioning
confidence: 99%
“…It has led to a new trend for precision-scalable neural processors to minimize energy at target performance without giving up flexibility. Recent papers have introduced runtime configurable MAC architectures optimized for deep learning, built either with high parallelization capabilities [3], [4] or bit-serial approaches [5], [6].…”
Section: Introductionmentioning
confidence: 99%
“…6 (a)) forms a primitive binary neural network accelerator based on the typical output-inputchannel parallelism, where each PE row corresponds to an input channel and each PE column corresponds to an output channel. This is a binary-only subset of a single core of the architecture proposed in [4]; the weights and inputs/outputs are all in 1-bit. In this configuration, an input activation is shared among multiple output channels (i.e.…”
Section: Fpga Implementationmentioning
confidence: 99%