2020
DOI: 10.1007/978-3-030-68238-5_7
|View full text |Cite
|
Sign up to set email alerts
|

One Weight Bitwidth to Rule Them All

Abstract: Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for quantization without accuracy degradation, different tasks may end up with different bitwidths. This creates complexity for software and hardware support and the complexity accumulates when one considers mixed-precision quantization, in which case each layer's weights use a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 33 publications
1
2
0
Order By: Relevance
“…Quantizing networks with depth-wise separable layers (MobileNetV2, EfficientNet lite, DeeplabV3, EfficientDet-D1) is more challenging; a trend we also observed from the PTQ results in section 3.6 and discussed in the literature (Chin et al, 2020;Sheng et al, 2018a). Whereas 8-bit quantization incurs close to no accuracy drop, quantizing weights to 4 bits leads to a larger drop, e.g.…”
Section: Methodssupporting
confidence: 64%
“…Quantizing networks with depth-wise separable layers (MobileNetV2, EfficientNet lite, DeeplabV3, EfficientDet-D1) is more challenging; a trend we also observed from the PTQ results in section 3.6 and discussed in the literature (Chin et al, 2020;Sheng et al, 2018a). Whereas 8-bit quantization incurs close to no accuracy drop, quantizing weights to 4 bits leads to a larger drop, e.g.…”
Section: Methodssupporting
confidence: 64%
“…Quantization (Low Precision Inference): A common solution is to compress NN models with quantization (Asanovic and Morgan, 1991 ; Hubara et al, 2016 ; Rastegari et al, 2016 ; Zhou et al, 2016 , 2017 ; Cai et al, 2017 , 2020b ; Choi et al, 2018 ; Jacob et al, 2018 ; Zhang et al, 2018a ; Dong et al, 2019 ; Wang et al, 2019c ; Chin et al, 2020 ; Gholami et al, 2021 ), where low bit-precision is used for weights/activations. A notable work here is Deep Compression (Han et al, 2016 ), which used quantization to compress the model footprint of the SqueezeNet model discussed above, bringing its size to 500x smaller than AlexNet.…”
Section: Technology State-of-the-artmentioning
confidence: 99%
“…To reduce the model size, quantization and pruning are used for model compression [42]. Quantization, which reduces the floating point precision of parameters and gradients, can be rule-based [43] or automated [44], with mixed bitwidths or optimized single bitwidth [45]. On the extreme end, binarized neural networks are quantized to 1, 2 or 3 bits [46] and provide superior efficiency, but at the cost of predictive accuracy.…”
Section: Resource Limited Devicesmentioning
confidence: 99%