2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2021
DOI: 10.1109/cvprw53098.2021.00341
|View full text |Cite
|
Sign up to set email alerts
|

In-Hindsight Quantization Range Estimation for Quantized Training

Abstract: Quantization techniques applied to the inference of deep neural networks have enabled fast and efficient execution on resource-constraint devices. The success of quantization during inference has motivated the academic community to explore fully quantized training, i.e. quantizing backpropagation as well. However, effective gradient quantization is still an open problem. Gradients are unbounded and their distribution changes significantly during training, which leads to the need for dynamic quantization. As we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 8 publications
0
3
0
Order By: Relevance
“…Also, Sun et al ( 2019) presented a novel hybrid format for full training in FP8, while the weights and activations are quantized to [1,4,3] format, the neural gradients are quantized to [1,5,2] format to catch a wider dynamic range. Fournarakis & Nagel (2021) suggested a method to reduce the data traffic during the calculation of the quantization range.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Also, Sun et al ( 2019) presented a novel hybrid format for full training in FP8, while the weights and activations are quantized to [1,4,3] format, the neural gradients are quantized to [1,5,2] format to catch a wider dynamic range. Fournarakis & Nagel (2021) suggested a method to reduce the data traffic during the calculation of the quantization range.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, this issue might be solved with dedicated hardware such as a unit that calculates the statistics value more efficiently or use memory-on-chip blocks which reduce data-movement overhead. A recent method (Fournarakis & Nagel, 2021) tries to reduce the data movement by using previous iterations statistic, but as shown in Fig. 5a in the appendix, combining it with LUQ cause accuracy degradation.…”
Section: Future Directionsmentioning
confidence: 99%
See 1 more Smart Citation