2019
DOI: 10.48550/arxiv.1910.04877
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Bit Efficient Quantization for Deep Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…Traditionally, neural networks relied on 32 bit numerical operations for training and evaluation. It was found however that inference can work fine with 8 bit [18], 3 bit [19], 2 bit [20] or even 1 bit (binary) weights and operations [21]. Training a model is more difficult but models have been successfully trained using 4 bit weights [4].…”
Section: A Managing Model Versionsmentioning
confidence: 99%
“…Traditionally, neural networks relied on 32 bit numerical operations for training and evaluation. It was found however that inference can work fine with 8 bit [18], 3 bit [19], 2 bit [20] or even 1 bit (binary) weights and operations [21]. Training a model is more difficult but models have been successfully trained using 4 bit weights [4].…”
Section: A Managing Model Versionsmentioning
confidence: 99%
“…Some approaches, e.g. [32,24,6] which only quantize weights to fixed point, are hard to adopted to accelerate the real inference process. Besides, some methods [27,33,5] indeed quantize both weights and activations to fixed point, but they usually need particular hardware or software to facilitate the implementation of quantized inference.…”
Section: Industrial Applicabilitymentioning
confidence: 99%
“…However, less essential operations can also be pruned after training 1535,1536 . Another approach is quantization, where ANN bit depths are decreased, often to efficient integer instructions, to increase inference throughput 1537,1538 . Quantization often decreases performance; however, the amount of quantization can be adapted to ANN components to optimize performancethroughput tradeoffs 1539 .…”
Section: Deploymentmentioning
confidence: 99%