Proceedings of the 26th Asia and South Pacific Design Automation Conference 2021
DOI: 10.1145/3394885.3431554
|View full text |Cite
|
Sign up to set email alerts
|

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 30 publications
(17 citation statements)
references
References 1 publication
0
17
0
Order By: Relevance
“…Pruning [19,71,78,79,132,134,140,171,200,265,288] Quantization [19,68,90,134,166,179,291,307,311,314] Knowledge Distillation [29,41,42,80,83,88,95,170,186,195,220,228,231,239,257,266,267,274,295,296,300,312] Low rank factorization [76,98,119,168,190,196,210,292] Conditional Computation…”
Section: Model Compressionmentioning
confidence: 99%
See 1 more Smart Citation
“…Pruning [19,71,78,79,132,134,140,171,200,265,288] Quantization [19,68,90,134,166,179,291,307,311,314] Knowledge Distillation [29,41,42,80,83,88,95,170,186,195,220,228,231,239,257,266,267,274,295,296,300,312] Low rank factorization [76,98,119,168,190,196,210,292] Conditional Computation…”
Section: Model Compressionmentioning
confidence: 99%
“…With the proposed dynamic programming assisted quantisation approach, the authors demonstrated a 16× compression in a ResNet-18 model with less than a 3% accuracy drop. The authors in [90], proposed a quantisation scheme for DNN inference that targets weights along with the inputs to the model and the partial sums occurring inside the hardware accelerator. Experiments showed that the proposed schema reduced the inference latency and energy consumption by up to 3.89× and 4.84× respectively while experiencing a 1.18% loss in the DNN inference accuracy.…”
Section: Model Compressionmentioning
confidence: 99%
“…Several works propose hardware accelerators for basecalling [63,77,78] or read mapping [54,[56][57][58]62,[65][66][67][68]71,[79][80][81][82][83]. Among these accelerators, non-volatile memory (NVM)-based processing in memory (PIM) accelerators offer high performance and efficiency since NVM-based PIM provides in-situ and highly-parallel computation support for matrix-vector mul-tiplications (MVM) [101][102][103][104][105][106][107][108][109][110][111] and string matching operations [112][113][114][115][116][117][118][119][120][121][122][123][124][125][126][127][128][129][130]…”
Section: State-of-the-art Solutionsmentioning
confidence: 99%
“…Quantization in hardware saves memory space, reduces data movement and latency for arithmetic operations [2]. Full-precision computations and complex arithmetics support for large neural networks in low-power hardware are challenging.…”
Section: Introductionmentioning
confidence: 99%