Finding Non-uniform Quantization Schemes Using Multi-task Gaussian Processes

Nascimento, Marcelo Gennari do; Costain, Theo W.; Prisacariu, Victor Adrian

doi:10.1007/978-3-030-58520-4_23

Cited by 5 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quantising 32-bit floating point numbers down to 8, 4 or even as low as 1 bit, these methods are able to drastically reduce the memory footprint of deep models. Other works [9,12,40] demonstrated that non-uniform quantisation schemes, where different layers of the network can be quantised to different numbers of bits, can further reduce the size of the model without compromising accuracy. These non-uniform schemes influence some of our experiments, where we employ a similar approach.…”

Section: Related Workmentioning

confidence: 99%

Approximating Continuous Convolutions for Deep Network Compression

Costain¹,

Prisacariu²

2022

Preprint

View full text Add to dashboard Cite

We present ApproxConv, a novel method for compressing the layers of a convolutional neural network. Reframing conventional discrete convolution as continuous convolution of parametrised functions over space, we use functional approximations to capture the essential structures of CNN filters with fewer parameters than conventional operations. Our method is able to reduce the size of trained CNN layers requiring only a small amount of fine-tuning. We show that our method is able to compress existing deep network models by half whilst losing only 1.86% accuracy. Further, we demonstrate that our method is compatible with other compression methods like quantisation allowing for further reductions in model size.

show abstract

Section: Related Workmentioning

confidence: 99%

Approximating Continuous Convolutions for Deep Network Compression

Costain¹,

Prisacariu²

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Early NAS methods adopt reinforcement learning (RL) or evolutionary strategy [38,2,3,31,30,39] to search among thousands of individually trained networks, which costs huge computation sources. Recent works focus on efficient weight-sharing methods, which falls into two categories: one-shot approaches [6,4,1,7,18,33,29] and gradient-based approaches [32,27,9,8,20,12,34,23], achieve state-of-the-art results on a series of tasks [10,17,24,35,16,28] in various search spaces. They construct a super network/graph which shares weights with all sub-network/graphs.…”

Section: Related Workmentioning

confidence: 99%

Single-DARTS: Towards Stable Architecture Search

Hou¹,

Jin²,

Chen³

2021

Preprint

View full text Add to dashboard Cite

Differentiable architecture search (DARTS) marks a milestone in Neural Architecture Search (NAS), boasting simplicity and small search costs. However, DARTS still suffers from frequent performance collapse, which happens when some operations, such as skip connections, zeroes and poolings, dominate the architecture. In this paper, we are the first to point out that the phenomenon is attributed to bi-level optimization.We propose Single-DARTS which merely uses singlelevel optimization, updating network weights and architecture parameters simultaneously with the same data batch. Even single-level optimization has been previously attempted, no literature provides a systematic explanation on this essential point. Replacing the bi-level optimization, Single-DARTS obviously alleviates performance collapse as well as enhances the stability of architecture search. Experiment results show that Single-DARTS achieves state-of-the-art performance on mainstream search spaces. For instance, on NAS-Benchmark-201, the searched architectures are nearly optimal ones. We also validate that the single-level optimization framework is much more stable than the bi-level one. We hope that this simple yet effective method will give some insights on differential architecture search. The code is available at https://github.com/PencilAndBike/Single-DARTS.git.

show abstract

“…Most lightweight networks are obtained by either compressing existing over-parameterized DNN models, called model compression, or designing small networks directly. The typical model compression techniques include pruning [21,61], factorizing [11,64,30], quantizing [56,14,63], or distilling [24,42] the pretrained weights while maintaining competitive accuracy. To avoid the pre-training over-parameterized large models, one could also focus on directly building small and efficient network architectures that could be trained from scratch.…”

Section: Related Workmentioning

confidence: 99%