2021
DOI: 10.48550/arxiv.2112.10769
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Logarithmic Unbiased Quantization: Simple 4-bit Training in Deep Learning

Abstract: Quantization of the weights and activations is one of the main methods to reduce the computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4-bit quantization of the forward phase. However, this constitutes only a third of the training process. Reducing the computational footprint of the entire training process requires the quantization of the neural gradients, i.e., the loss gradients with respect to the outputs of intermediate neural layers. In this work, we examine the impor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(14 citation statements)
references
References 9 publications
0
14
0
Order By: Relevance
“…Several other authors have proposed full training strategies [22,20,34,35], but for bitcounts over 8. [36] proposes an approach using 4 bits, but fixes the representation, while our neuron is parametric, which will offer more flexibility. For such a study, the proposed approach also has the potential to use a different base b and an arbitrary activation function, provided that the training ensures that weights and activations remain below 1.…”
Section: Discussionmentioning
confidence: 99%
“…Several other authors have proposed full training strategies [22,20,34,35], but for bitcounts over 8. [36] proposes an approach using 4 bits, but fixes the representation, while our neuron is parametric, which will offer more flexibility. For such a study, the proposed approach also has the potential to use a different base b and an arbitrary activation function, provided that the training ensures that weights and activations remain below 1.…”
Section: Discussionmentioning
confidence: 99%
“…For example, the energy consumption of an INT32 multiplication is approximately 22x higher than that of an INT32 addition. Thus, there are multiplication-less methods that directly replace the multiplication with energy-efficient operations such as addition and bitwise shift [7,8,13,15,24,37,38]. Most of these works, such as INQ [38], ShiftCNN [15], and LogNN [24] also start with the FP32 pre-trained models rather than training from scratch so they cannot reduce the energy consumption of training.…”
Section: Introductionmentioning
confidence: 99%
“…Most of these works, such as INQ [38], ShiftCNN [15], and LogNN [24] also start with the FP32 pre-trained models rather than training from scratch so they cannot reduce the energy consumption of training. Among the methods that can train from scratch [7,8,13], AdderNet [7] replaces all of the FP32 multiplications in the linear layer with FP32 additions whose energy consumption is still higher than the fixed point operations. The other works [8,13] apply low-precision Powerof-Two (PoT) numbers, whose value is zero or power of 2, to replace a part of the multiplications in training with bitwise shifts and sign flip operations.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations