Logarithmic Unbiased Quantization: Simple 4-bit Training in Deep Learning

Chmiel, Brian; Banner, Ron; Hoffer, Elad; Yaacov, Hilla Ben; Soudry, Daniel

doi:10.48550/arxiv.2112.10769

Cited by 2 publications

(14 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several other authors have proposed full training strategies [22,20,34,35], but for bitcounts over 8. [36] proposes an approach using 4 bits, but fixes the representation, while our neuron is parametric, which will offer more flexibility. For such a study, the proposed approach also has the potential to use a different base b and an arbitrary activation function, provided that the training ensures that weights and activations remain below 1.…”

Section: Discussionmentioning

confidence: 99%

Low-precision logarithmic arithmetic for neural network accelerators

Maxime

Dinechin

Pétrot

2022

2022 IEEE 33rd International Conference on Application-Specific Systems, Architectures and Processors (ASAP)

View full text Add to dashboard Cite

Resource requirements for hardware acceleration of neural networks inference is notoriously high, both in terms of computation and storage. One way to mitigate this issue is to quantize parameters and activations. This is usually done by scaling and centering the distributions of weights and activations, on a kernel per kernel basis, so that a low-precision binary integer representation can be used. This work studies low-precision logarithmic number system (LNS) as an efficient alternative.Firstly, LNS has more dynamic than fixed-point for the same number of bits. Thus, when quantizing MNIST and CIFAR reference networks without retraining, the smallest format size achieving top-1 accuracy comparable to floating-point is 1 to 3 bits smaller with LNS than with fixed-point. In addition, it is shown that the zero bit of classical LNS is not needed in this context, and that the sign bit can be saved for activations.Secondly, low-precision LNS enables efficient inference architectures where 1/ multiplications reduce to additions; 2/ the weighted inputs are converted to classical linear domain, but the tables needed for this conversion remain very small thanks to the low precision; and 3/ the conversion of the output activation back to LNS can be merged with an arbitrary activation function.The proposed LNS neuron is detailed and its implementation on FPGA is shown to be smaller and faster than a fixed-point one for comparable accuracy.

show abstract

Section: Discussionmentioning

confidence: 99%

Low-precision logarithmic arithmetic for neural network accelerators

Maxime

Dinechin

Pétrot

2022

2022 IEEE 33rd International Conference on Application-Specific Systems, Architectures and Processors (ASAP)

View full text Add to dashboard Cite

show abstract

“…For example, the energy consumption of an INT32 multiplication is approximately 22x higher than that of an INT32 addition. Thus, there are multiplication-less methods that directly replace the multiplication with energy-efficient operations such as addition and bitwise shift [7,8,13,15,24,37,38]. Most of these works, such as INQ [38], ShiftCNN [15], and LogNN [24] also start with the FP32 pre-trained models rather than training from scratch so they cannot reduce the energy consumption of training.…”

Section: Introductionmentioning

confidence: 99%

“…Most of these works, such as INQ [38], ShiftCNN [15], and LogNN [24] also start with the FP32 pre-trained models rather than training from scratch so they cannot reduce the energy consumption of training. Among the methods that can train from scratch [7,8,13], AdderNet [7] replaces all of the FP32 multiplications in the linear layer with FP32 additions whose energy consumption is still higher than the fixed point operations. The other works [8,13] apply low-precision Powerof-Two (PoT) numbers, whose value is zero or power of 2, to replace a part of the multiplications in training with bitwise shifts and sign flip operations.…”

Section: Introductionmentioning

confidence: 99%

“…Among the methods that can train from scratch [7,8,13], AdderNet [7] replaces all of the FP32 multiplications in the linear layer with FP32 additions whose energy consumption is still higher than the fixed point operations. The other works [8,13] apply low-precision Powerof-Two (PoT) numbers, whose value is zero or power of 2, to replace a part of the multiplications in training with bitwise shifts and sign flip operations. However, they cannot replace all of the multiplications during forward or backward propagation.…”

Section: Introductionmentioning

confidence: 99%

“…For example, DeepShift [13] only converts W to 5-bit PoT numbers because only the value range of W can be limited to the representation range of their 5-bit PoT numbers. Similarly, LUQ [8] only converts G to PoT numbers because their method cannot approximate distributions of W and A well. These methods cannot use one data format to represent each of W , A, and G whose data ranges and distributions vary widely from each other, so they keep one-third of the multiplications in training.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations