Hardware-Centric AutoML for Mixed-Precision Quantization

Wang, Kuan; Liu, Zhijian; Lin, Yujun; Lin, Ji; Han, Song

doi:10.1007/s11263-020-01339-6

Cited by 11 publications

(7 citation statements)

References 17 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ReLeQ [83] and HAQ [84] adopt reinforcement learning to learn the wordlength of each data-structure in a layerwise manner. Specifically, the ReLeQ uses the predicted bit-precision level to quantize the weights as in WRPN, whereas the HAQ quantizes both weights and activations on each explored wordlength in the same way as TensorRT.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

“…As a summary of the literature, improving the accuracy of quantized DNNs comes at the expense of floating-point computational cost in [30], [32], [34], [35], [38], [42], [45], [56]- [58], [61], [63]- [67], [69], [74], [76], [78]- [80], [82]- [84], [86]- [88]. Specifically, these approaches scale output activations of each layer with FP32 coefficient(s) to recover the dynamic range, and/or perform batch normalization as well as the operations of first and last layers with FP32 datastructures.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

See 1 more Smart Citation

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

2022

View full text Add to dashboard Cite

Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. In the past, graphic processing units enabled these breakthroughs because of their greater computational speed. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware especially when expensive floating-point operations can be avoided and replaced by more efficient fixed-point operations. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet 1 on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16ˆ, 10.36ˆ, and 6.44ˆwith less than 0.95%, 0.95%, and 1.99% accuracy drop, respectively.

show abstract

Section: Mixed-precision Quantizationmentioning

confidence: 99%

Section: Mixed-precision Quantizationmentioning

confidence: 99%

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

2022

View full text Add to dashboard Cite

show abstract

“…In order to tackle these, researchers have introduced different techniques to reduce the search cost, including differentiable architecture search [52], path-level binarization [53], single-path one-shot sampling [54], [55], [56], and weight sharing [50], [56], [57]. Furthermore, neural architecture search has also been used in compressing and accelerating neural networks, including pruning [35], [58], [59], [60], [61] and quantization [37], [54], [62], [63]. Most of these methods are tailored for 2D visual recognition, which has many well-defined search spaces [64].…”

Section: Neural Architecture Searchmentioning

confidence: 99%

PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution

Liu¹,

Tang²,

Zhao³

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

Self Cite

View full text Add to dashboard Cite

3D neural networks are widely used in real-world applications (e.g., AR/VR headsets, self-driving cars). They are required to be fast and accurate; however, limited hardware resources on edge devices make these requirements rather challenging. Previous work processes 3D data using either voxel-based or point-based neural networks, but both types of 3D models are not hardware-efficient due to the large memory footprint and random memory access. In this paper, we study 3D deep learning from the efficiency perspective. We first systematically analyze the bottlenecks of previous 3D methods. We then combine the best from point-based and voxel-based models together and propose a novel hardware-efficient 3D primitive, Point-Voxel Convolution (PVConv). We further enhance this primitive with the sparse convolution to make it more effective in processing large (outdoor) scenes. Based on our designed 3D primitive, we introduce 3D Neural Architecture Search (3D-NAS) to explore the best 3D network architecture given a resource constraint. We evaluate our proposed method on six representative benchmark datasets, achieving state-of-the-art performance with 1.8-23.7× measured speedup. Furthermore, our method has been deployed to the autonomous racing vehicle of MIT Driverless, achieving larger detection range, higher accuracy and lower latency.

show abstract

“…To tackle these, researchers have proposed different techniques to reduce the search cost, including differentiable architecture search [30], path-level binarization [6], single-path one-shot sampling [15,8,4], and weight sharing [50,4,57]. Besides, neural architecture search has also been used in compressing and accelerating neural networks, including pruning [17,31,5,27] and quantization [58,15,59,62]. Most of these methods are tailored for 2D visual recognition, which has many well-defined search spaces [44].…”

Section: Neural Architecture Searchmentioning

confidence: 99%

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Tang

Liu

Zhao

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able to recognize small instances (e.g., pedestrians, cyclists) very well due to the low-resolution voxelization and aggressive downsampling. To this end, we propose Sparse Point-Voxel Convolution (SPVConv), a lightweight 3D module that equips the vanilla Sparse Convolution with the high-resolution point-based branch. With negligible overhead, this point-based branch is able to preserve the fine details even from large outdoor scenes. To explore the spectrum of efficient 3D models, we first define a flexible architecture design space based on SPVConv, and we then present 3D Neural Architecture Search (3D-NAS) to search the optimal network architecture over this diverse design space efficiently and effectively. Experimental results validate that the resulting SPVNAS model is fast and accurate: it outperforms the state-of-the-art MinkowskiNet by 3.3%, ranking 1 st on the competitive SemanticKITTI leaderboard upon publication. It also achieves 8× computation reduction and 3× measured speedup over MinkowskiNet still with higher accuracy. Finally, we transfer our method to 3D object detection, and it achieves consistent improvements over the one-stage detection baseline on KITTI.

show abstract

Hardware-Centric AutoML for Mixed-Precision Quantization

Cited by 11 publications

References 17 publications

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

PVNAS: 3D Neural Architecture Search with Point-Voxel Convolution

Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution

Contact Info

Product

Resources

About