2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020
DOI: 10.1109/cvpr42600.2020.00215
|View full text |Cite
|
Sign up to set email alerts
|

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Abstract: We present APQ for efficient deep learning inference on resource-constrained hardware. Unlike previous methods that separately search the neural architecture, pruning policy, and quantization policy, we optimize them in a joint manner. To deal with the larger design space it brings, a promising approach is to train a quantization-aware accuracy predictor to quickly get the accuracy of the quantized model and feed it to the search engine to select the best fit. However, training this quantization-aware accuracy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
115
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 149 publications
(116 citation statements)
references
References 24 publications
1
115
0
Order By: Relevance
“…In AMC [208] and [209], learning-based approaches are adopted to prune and quantize the models for algorithm-hardware co-design. In APQ [210], pruning and quantization are optimized jointly with the NN model avoiding any accuracy loss.…”
Section: G Methods For Model Compressionmentioning
confidence: 99%
“…In AMC [208] and [209], learning-based approaches are adopted to prune and quantize the models for algorithm-hardware co-design. In APQ [210], pruning and quantization are optimized jointly with the NN model avoiding any accuracy loss.…”
Section: G Methods For Model Compressionmentioning
confidence: 99%
“…The exponentially large search space consisting of billions of or even more architectures renders NAS a very challenging task [15,40,41,43,45,47]. The key reason is that evaluating and ranking the architectures in terms of metrics of interest (e.g., accuracy and latency) can be extremely time-consuming.…”
Section: Introductionmentioning
confidence: 99%
“…The key reason is that evaluating and ranking the architectures in terms of metrics of interest (e.g., accuracy and latency) can be extremely time-consuming. As a result, many studies have been focused on reducing the cost 1 of training and evaluating the architecture accuracy, including reinforcement learning-based NAS with accuracy evaluated based on a small proxy dataset [52], differentiable NAS [45], one-shot or few-shot NAS [4,9,51], NAS assisted with an accuracy predictor [15,43], among many others.…”
Section: Introductionmentioning
confidence: 99%
“…The evolutionary design of neural networks, or neuroevolution, has recently led to the fully automated design of complex CNNs that are quite competitive in terms of accuracy and size, even for the most challenging datasets such as Im-ageNet [72]. In order to represent a candidate CNN in the genotype, a well-known CNN (such as MobileNetV2 in [73]) is usually taken as a template. The genotype then contains a set of parameters, each of them specifying possible values of the critical network's hyperparameters (the layer type, the number of filters, the kernel sizes, etc.).…”
Section: B Hardware-aware Neural Architecture Searchmentioning
confidence: 99%
“…In [74], the fixed-point quantization is applied as a postprocessing step after NAS is finished. However, for example, in APQ, a suitable quantization scheme is directly evolved during NAS [73]. APQ thus performs a joint search for architecture, pruning, and quantization policy starting with the MobileNetV2 network.…”
Section: B Hardware-aware Neural Architecture Searchmentioning
confidence: 99%