2020
DOI: 10.1007/978-3-030-58520-4_23
|View full text |Cite
|
Sign up to set email alerts
|

Finding Non-uniform Quantization Schemes Using Multi-task Gaussian Processes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…Quantising 32-bit floating point numbers down to 8, 4 or even as low as 1 bit, these methods are able to drastically reduce the memory footprint of deep models. Other works [9,12,40] demonstrated that non-uniform quantisation schemes, where different layers of the network can be quantised to different numbers of bits, can further reduce the size of the model without compromising accuracy. These non-uniform schemes influence some of our experiments, where we employ a similar approach.…”
Section: Related Workmentioning
confidence: 99%
“…Quantising 32-bit floating point numbers down to 8, 4 or even as low as 1 bit, these methods are able to drastically reduce the memory footprint of deep models. Other works [9,12,40] demonstrated that non-uniform quantisation schemes, where different layers of the network can be quantised to different numbers of bits, can further reduce the size of the model without compromising accuracy. These non-uniform schemes influence some of our experiments, where we employ a similar approach.…”
Section: Related Workmentioning
confidence: 99%
“…Early NAS methods adopt reinforcement learning (RL) or evolutionary strategy [38,2,3,31,30,39] to search among thousands of individually trained networks, which costs huge computation sources. Recent works focus on efficient weight-sharing methods, which falls into two categories: one-shot approaches [6,4,1,7,18,33,29] and gradient-based approaches [32,27,9,8,20,12,34,23], achieve state-of-the-art results on a series of tasks [10,17,24,35,16,28] in various search spaces. They construct a super network/graph which shares weights with all sub-network/graphs.…”
Section: Related Workmentioning
confidence: 99%
“…Most lightweight networks are obtained by either compressing existing over-parameterized DNN models, called model compression, or designing small networks directly. The typical model compression techniques include pruning [21,61], factorizing [11,64,30], quantizing [56,14,63], or distilling [24,42] the pretrained weights while maintaining competitive accuracy. To avoid the pre-training over-parameterized large models, one could also focus on directly building small and efficient network architectures that could be trained from scratch.…”
Section: Related Workmentioning
confidence: 99%