2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00801
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Loss-Aware Quantization for Multi-Bit Networks

Abstract: We investigate the compression of deep neural networks by quantizing their weights and activations into multiple binary bases, known as multi-bit networks (MBNs), which accelerate the inference and reduce the storage for the deployment on low-resource mobile and embedded platforms. We propose Adaptive Loss-aware Quantization (ALQ), a new MBN quantization pipeline that is able to achieve an average bitwidth below one-bit without notable loss in inference accuracy. Unlike previous MBN quantization solutions that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 38 publications
(23 citation statements)
references
References 13 publications
0
23
0
Order By: Relevance
“…In this section, we compare our GMPQ with the stateof-the-art fixed-precision models containing APoT [25] and RQ [31] and mixed-precision networks including ALQ [38], HAWQ [9], EdMIPS [3], HAQ [50], BP-NAS [56], HMQ [13] and DQ [47] on ImageNet for image classification and on PASCAL VOC for object detection. We also provide the performance of full-precision models for reference.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…In this section, we compare our GMPQ with the stateof-the-art fixed-precision models containing APoT [25] and RQ [31] and mixed-precision networks including ALQ [38], HAWQ [9], EdMIPS [3], HAQ [50], BP-NAS [56], HMQ [13] and DQ [47] on ImageNet for image classification and on PASCAL VOC for object detection. We also provide the performance of full-precision models for reference.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…Compression Quantization reduces the bit-width of NN parameters, which permits a drastic reduction of the memory footprint [24,33,40]. It has become a standard compression technique in TinyML due to its significant memory savings while usually having a negligible effect on accuracy [11].…”
Section: Related Workmentioning
confidence: 99%
“…Whereas quantization can in principle be used with any bit-width, e.g. 4 bit [7] or an adaptive bitwidth [40], we focus on 8 bit quantization which is supported by most MCUs. Unsupported bit-widths need to be emulated, resulting in inefficient hardware utilization [3,5].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…proposed, based on different compression methods such as knowledge distillation [4,31], pruning [15,19,20,40], quantization [32], neural architecture search (NAS) [38], etc. Among these categories, network pruning, which removes redundant and unimportant connections, is one of the most popular and promising compression methods, and recently received great interest from the industry that seeks to compress their AI models and fit them on small target devices with resource constraints.…”
Section: Introductionmentioning
confidence: 99%