2018
DOI: 10.1109/tnnls.2018.2808319
|View full text |Cite
|
Sign up to set email alerts
|

Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks

Abstract: Convolutional neural networks (CNNs) have led to remarkable progress in a number of key pattern recognition tasks, such as visual scene understanding and speech recognition, that potentially enable numerous applications. Consequently, there is a significant need to deploy trained CNNs to resource-constrained embedded systems. Inference using pretrained modern deep CNNs, however, requires significant system resources, including computation, energy, and memory space. To enable efficient implementation of trained… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

3
130
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 223 publications
(133 citation statements)
references
References 8 publications
3
130
0
Order By: Relevance
“…There are several works that describe quantization and improving networks for lower bit inference and deployment [9,10,16,34]. These methods all rely strongly on finetuning, making them level 3 methods, whereas data-free quantization improves performance similarly without that requirement.…”
Section: Background and Related Workmentioning
confidence: 99%
“…There are several works that describe quantization and improving networks for lower bit inference and deployment [9,10,16,34]. These methods all rely strongly on finetuning, making them level 3 methods, whereas data-free quantization improves performance similarly without that requirement.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Several groups have proposed compressed new compute-and memory-efficient DNN architectures [4]-[6] and parameter-efficient neural networks, using methods such as DNN pruning [7], distillation [8], and low-precision arithmetic [9], [10]. Among these approaches, low-precision arithmetic is noted for its ability to reduce memory capacity, bandwidth, latency, and energy consumption associated with MAC units in DNNs and an increase in the level of data parallelism [9], [11], [12].The ultimate goal of low-precision DNN design is to reduce the original hardware complexity of the high-precision DNN model to a level suitable for edge devices without significantly degrading performance.To address the gaps in previous studies, we are motivated to study low-precision posit for DNN training on the edge.…”
mentioning
confidence: 99%
“…Vanhoucke et al [10] linearly normalizes weights and (sigmoid) activations of every layer in a speed-recognition NN to 8-bit by analysing the range of weights and activations. A similar approach is implemented in several deep learning frameworks such as Tensorflow [11] and Caffe-Ristretto [12]. Lin, Talathi, and Annapureddy [6] propose an analytical model to quickly convert pre-trained models to fixed-point.…”
Section: Related Workmentioning
confidence: 99%
“…To reduce the number of solutions to consider, several heuristics were developed. Gysel et al [12] propose an iterative quantization procedure where weights are quantized first, and activations are quantized second. A similar two-step approach is described by other related works [13].…”
Section: Related Workmentioning
confidence: 99%