FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Liu, Minghui; Deng, Jiali; Yang, Meiyi; Cheng, Xuan; Liu, Nianbo; Liu, Ming; Wang, Xiaomin

doi:10.24963/ijcai.2022/164

Cited by 67 publications

(54 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The transformed formulas () and () still include hardware‐unfriendly exponents and divisions that would consume many hardware resources if using 32‐bit single‐precision floating‐point (FP32) directly. Since transformer is less affected by low‐precision calculations [11, 12], we use 16‐bit fixed‐point (INT16) to perform all softmax and GELU operations. Next, we optimize the hardware designs for softmax and GELU.…”

Section: Transformations Of Softmax and Gelumentioning

confidence: 99%

A high speed reconfigurable architecture for softmax and GELU in vision transformer

Zhang

Xie

et al. 2023

Electronics Letters

View full text Add to dashboard Cite

Transformers have been widely used in various computer vision applications. Compared to traditional convolutional neural networks (CNNs), transformer's inference includes plenty of non‐linear operations, such as softmax and Gaussian error linear units (GELU). As the scale of transformers grows, an efficient hardware implementation of these operations is significant. However, the current works of computer vision neural network accelerators focus on CNN and less attention is paid to transformer. In addition, most current FPGA‐based softmax or GELU accelerators are not designed for vision transformer (ViT). To solve this problem, this work proposes a high speed reconfigurable accelerator. The architecture can support both softmax and GELU functions in ViT by reconfiguring the data path. This architecture on Xilinx XCVU37P is implemented through mathematical transformation and hardware optimization design, and achieve the performance of 102.4 Giga bits per second (Gbps) at 200 MHz. Experimental results show that the architecture achieves a very small accuracy loss in the ViT's inference by using fixed‐point 16‐bit quantization. Compared with existing accelerators, the design has greater throughput and area efficiency.

show abstract

Section: Transformations Of Softmax and Gelumentioning

confidence: 99%

A high speed reconfigurable architecture for softmax and GELU in vision transformer

Zhang

Xie

et al. 2023

Electronics Letters

View full text Add to dashboard Cite

show abstract

“…Zhu et al (2021b) identifies the importance of different dimensions in each layer of ViTs and then executes model pruning. Liu et al (2021b); Lin et al (2022); Li et al (2022d) quantize weights and inputs to compress the learning model. Li et al (2022a) studies automated progressive learning that automatically increases the model capacity onthe-fly.…”

Section: Background and Related Workmentioning

confidence: 99%

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Hong-kang¹,

Wang²,

Liu³

et al. 2023

Preprint

View full text Add to dashboard Cite

Vision Transformers (ViTs) with self-attention modules have recently achieved great empirical success in many vision tasks. Due to non-convex interactions across layers, however, the theoretical learning and generalization analysis is mostly elusive. Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a shallow ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. We characterize the sample complexity to achieve a zero generalization error. Our sample complexity bound is positively correlated with the inverse of the fraction of label-relevant tokens, the token noise level, and the initial model error. We also prove that a training process using stochastic gradient descent (SGD) leads to a sparse attention map, which is a formal verification of the general intuition about the success of attention. Moreover, this paper indicates that a proper token sparsification can improve the test performance by removing label-irrelevant and/or noisy tokens, including spurious correlations. Empirical experiments on synthetic data and CIFAR-10 dataset justify our theoretical results and generalize to deeper ViTs.

show abstract

“…Unfortunately, to the best of our knowledge, we are not aware of any open-resource PTQ method specifically for LIC models. For fair comparison, we implement the Range-Adaptive Quantization (RAQ) (Hong et al, 3615G% /X>)3@ /X>,17@2XUV /X>,17@5$4 /X>,17@)49L7 &KHQJ>)3@ &KHQJ>,17@2XUV 0LQQHQ>)3@ 0LQQHQ>,17@2XUV 970 %3* (a) %LWUDWHESS 3615G% /X>)3@ /X>,17@2XUV /X>,17@5$4 /X>,17@)49L7 &KHQJ>)3@ &KHQJ>,17@2XUV 0LQQHQ>)3@ 0LQQHQ>,17@2XUV 970 %3* (b) 2020) originally requiring model retraining as a PTQ approach; On the other hand, we also include the FQ-ViT (Lin et al, 2022) for comparative study. It is a PTQ method originally designed for image classification and objective detection using Transformer backbone.…”

Section: Comparison Setupmentioning

confidence: 99%

Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression

Shi¹,

Lu²,

Chen³

et al. 2022

Preprint

View full text Add to dashboard Cite

Quantizing floating-point neural network to its fixed-point representation is crucial for Learned Image Compression (LIC) because it ensures the decoding consistency for interoperability and reduces space-time complexity for implementation. Existing solutions often have to retrain the network for model quantization which is time consuming and impractical. This work suggests the use of Post-Training Quantization (PTQ) to directly process pretrained, off-the-shelf LIC models. We theoretically prove that minimizing the mean squared error (MSE) in PTQ is suboptimal for compression task and thus develop a novel Rate-Distortion (R-D) Optimized PTQ (RDO-PTQ) to best retain the compression performance. Such RDO-PTQ just needs to compress few images (e.g., 10) to optimize the transformation of weight, bias, and activation of underlying LIC model from its native 32-bit floating-point (FP32) format to 8-bit fixed-point (INT8) precision for fixedpoint inference onwards. Experiments reveal outstanding efficiency of the proposed method on different LICs, showing the closest coding performance to their floating-point counterparts. And, our method is a lightweight and plug-and-play approach without any need of model retraining which is attractive to practitioners.

show abstract

FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Cited by 67 publications

References 4 publications

A high speed reconfigurable architecture for softmax and GELU in vision transformer

A high speed reconfigurable architecture for softmax and GELU in vision transformer

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Rate-Distortion Optimized Post-Training Quantization for Learned Image Compression

Contact Info

Product

Resources

About