Relaxed Quantization for Discretized Neural Networks

Louizos, Christos; Reisser, Matthias; Blankevoort, Tijmen; Gavves, Efstratios; Welling, Max

doi:10.48550/arxiv.1810.01875

Cited by 15 publications

(28 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The proposed FAT is built on Pytorch framework. We compare FAT with state-of-the-art approaches, including WAGE [40], LQ-Net [43], PACT [7], RQ [20], UNIQ [3], DQ [35], BCGD [2] [35], DSQ [10], QIL [13], HAQ [36], APoT [17], HMQ [11] DJPQ [38], LSQ [8].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

Tao,

Lin,

Chen

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning convolutional neural networks (CNNs) with low bitwidth is challenging because performance may drop significantly after quantization. Prior arts often discretize the network weights by carefully tuning hyper-parameters of quantization (e.g. non-uniform stepsize and layer-wise bitwidths), which are complicated and sub-optimal because the full-precision and low-precision models have large discrepancy. This work presents a novel quantization pipeline, Frequency-Aware Transformation (FAT), which has several appealing benefits. (1) Rather than designing complicated quantizers like existing works, FAT learns to transform network weights in the frequency domain before quantization, making them more amenable to training in low bitwidth. ( 2) With FAT, CNNs can be easily trained in low precision using simple standard quantizers without tedious hyper-parameter tuning. Theoretical analysis shows that FAT improves both uniform and non-uniform quantizers. (3) FAT can be easily plugged into many CNN architectures. When training ResNet-18 and MobileNet-V2 in 4 bits, FAT plus a simple rounding operation 1 already achieves 70.5% and 69.2% top-1 accuracy on ImageNet without bells and whistles, outperforming recent state-of-the-art by reducing 54.9× and 45.7× computations against full-precision models. We hope FAT provide a novel perspective for model quantization. Code is available at https://github.com/ChaofanTao/ FAT_Quantization.

show abstract

Section: Methodsmentioning

confidence: 99%

“…As shown in Table 2, we compare all methods appeared in the main paper, including WAGE [40], LQ-Net [43], PACT [7], RQ [20], UNIQ [3], DQ [35], BCGD [2] [35], DSQ [10], QIL [13], HAQ [36], APoT [17], [11] DJPQ [38], LSQ [8].…”

Section: Categorization Of Quantization Methodsmentioning

confidence: 99%

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

Tao,

Lin,

Chen

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In this section, we compare our GMPQ with the stateof-the-art fixed-precision models containing APoT [25] and RQ [31] and mixed-precision networks including ALQ [38], HAWQ [9], EdMIPS [3], HAQ [50], BP-NAS [56], HMQ [13] and DQ [47] on ImageNet for image classification and on PASCAL VOC for object detection. We also provide the performance of full-precision models for reference.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

Wang

Han²,

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we propose a generalizable mixedprecision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy optimality, leading to heavy search cost on challenging largescale datasets in realistic applications. On the contrary, our GMPQ searches the mixedquantization policy that can be generalized to largescale datasets with only a small amount of data, so that the search cost is significantly reduced without performance degradation. Specifically, we observe that locating network attribution correctly is general ability for accurate visual analysis across different data distribution. Therefore, despite of pursuing higher model accuracy and complexity, we preserve attribution rank consistency between the quantized models and their full-precision counterparts via efficient capacity-aware attribution imitation for generalizable mixed-precision quantization strategy search. Extensive experiments show that our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks in significantly reduced search cost. The code is available at https://github.com/ZiweiWangTHU/GMPQ.git.

show abstract

“…For both of the MNIST models, we found that letting each subcomponent of F be a simple dimensionwise scalar affine transform (similar to f dense in figure 3), was sufficient. Since each φ is quantized to integers, having a flexible scale and shift leads to flexible SQ, similar to in (Louizos, Reisser, et al, 2018). Due to the small size of the networks, more complex transformation functions lead to too much overhead.…”

Section: Mnist Experimentsmentioning

confidence: 99%

“…While these technically have a finite (but large) number of states, the best results in terms of both accuracy and bit rate are typically achieved for a significantly reduced number of states. Existing approaches to model compression often acknowledge this by quantizing each individual linear filter coefficient in an ANN to a small number of pre-determined values (Louizos, Reisser, et al, 2018;Baskin et al, 2018;F. Li et al, 2016).…”

Section: Introductionmentioning

confidence: 99%

Scalable Model Compression by Entropy Penalized Reparameterization

Oktay¹,

Ballé²,

Singh³

et al. 2019

Preprint

View full text Add to dashboard Cite

We describe an end-to-end neural network weight compression approach that draws inspiration from recent latent-variable data compression methods. The network parameters (weights and biases) are represented in a "latent" space, amounting to a reparameterization. This space is equipped with a learned probability model, which is used to impose an entropy penalty on the parameter representation during training, and to compress the representation using arithmetic coding after training. We are thus maximizing accuracy and model compressibility jointly, in an endto-end fashion, with the rate-error trade-off specified by a hyperparameter. We evaluate our method by compressing six distinct model architectures on the MNIST, CIFAR-10 and ImageNet classification benchmarks. Our method achieves state-ofthe-art compression on VGG-16, LeNet300-100 and several ResNet architectures, and is competitive on LeNet-5.

show abstract

Relaxed Quantization for Discretized Neural Networks

Cited by 15 publications

References 20 publications

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

FAT: Learning Low-Bitwidth Parametric Representation via Frequency-Aware Transformation

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

Scalable Model Compression by Entropy Penalized Reparameterization

Contact Info

Product

Resources

About