DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks

Dbouk, Hassan; Sanghvi, Hetul; Mehendale, Mahesh; Shanbhag, Naresh R.

doi:10.1007/978-3-030-58583-9_6

Cited by 8 publications

(9 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Quantization Reducing the complexity of CNNs via model quantization in the absence of any adversary is a well studied problem in the deep learning literature [32,33,2,46,15,27,3]. The role of quantization on adversarial robustness was studied in Defensive Quantization (DQ) [20] where it was observed that conventional post-training fixed-point quantization makes networks more vulnerable to adversarial perturbations than their full-precision counterparts.…”

Section: Background and Related Workmentioning

confidence: 99%

“…We measure the throughput in FPS by mapping the networks onto an NVIDIA Jetson Xavier via native PyTorch [25] commands. We experiment with VGG-16 [38], ResNet-18 3 [12], ResNet-50, and WideResNet-28-4 [45] network architectures, and report both natural accuracy (A nat ) and robust accuracy (A rob ). Following standard procedure, we report A rob against ∞ bounded perturbations generated via PGD [22] with standard attack strengths: = 8/255 with PGD-100 for both CIFAR-10 [18] and SVHN [24] datasets, and = 4/255 with PGD-50 for the ImageNet [31] dataset.…”

Section: Evaluation Setupmentioning

confidence: 99%

See 1 more Smart Citation

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks

Dbouk¹,

Shanbhag²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Despite their tremendous successes, convolutional neural networks (CNNs) incur high computational/storage costs and are vulnerable to adversarial perturbations. Recent works on robust model compression address these challenges by combining model compression techniques with adversarial training. But these methods are unable to improve throughput (frames-per-second) on real-life hardware while simultaneously preserving robustness to adversarial perturbations. To overcome this problem, we propose the method of Generalized Depthwise-Separable (GDWS) convolution -an efficient, universal, post-training approximation of a standard 2D convolution. GDWS dramatically improves the throughput of a standard pre-trained network on real-life hardware while preserving its robustness. Lastly, GDWS is scalable to large problem sizes since it operates on pre-trained models and doesn't require any additional training. We establish the optimality of GDWS as a 2D convolution approximator and present exact algorithms for constructing optimal GDWS convolutions under complexity and error constraints. We demonstrate the effectiveness of GDWS via extensive experiments on CIFAR-10, SVHN, and ImageNet datasets. Our code can be found at https://github.com/hsndbk4/ GDWS.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Section: Evaluation Setupmentioning

confidence: 99%

Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks

Dbouk¹,

Shanbhag²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, many recent works focus on building resource-efficient deep neural networks to bridge the gap between the scale of deep neural networks and actual permissible computational complexity/memory-bounds for on-device model deployments. Some of these works consider designing computation-and memoryefficient modules for neural architectures, while others focus on compressing a given neural network by either pruning its weights [7,12,19,36] or reducing the bits used to represent the weights and activations [3,8,18]. The last approach, neural network quantization, is beneficial for building ondevice AI systems since the edge devices oftentimes only support low bitwidth-precision parameters and/or operations.…”

Section: Introductionmentioning

confidence: 99%

“…As shown in Figure 1 Right, although BRECQ [18] addresses the problem by considering the dependency between filters in each block, it is limited to the Post-Training Quantization (PTQ) problem, which suffers from inevitable information loss, resulting in inferior performance. The most recent Quantization-Aware Training (QAT) methods [8,21] are concerned with obtaining quantized weights by minimizing quantization losses with parameterized activation functions, disregarding cross-layer weight dependencies during training process. To the best of our knowledge, no prior work explicitly considers dependencies among the weights for QAT.…”

Section: Introductionmentioning

confidence: 99%

BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation

Park¹,

Yoon²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Neural network quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation, while preserving the performance of the original model. However, extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures (e.g., MobileNets) often used for edge-device deployments results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration even with extreme quantization by focusing on the inter-weight dependencies, between the weights within each layer and across consecutive layers. To minimize the quantization impact of each weight on others, we perform an orthonormal transformation of the weights at each layer by training an input-dependent correlation matrix and importance vector, such that each weight is disentangled from the others. Then, we quantize the weights based on their importance to minimize the loss of the information from the original weights/activations. We further perform progressive layer-wise quantization from the bottom layer to the top, so that quantization at each layer reflects the quantized distributions of weights and activations at previous layers. We validate the effectiveness of our method on various benchmark datasets against strong neural quantization baselines, demonstrating that it alleviates the performance degeneration on ImageNet and successfully preserves the full-precision model performance on CIFAR-100 with compact backbone networks. * Equal contribution.Preprint. Under review.

show abstract

“…The second is to reduce the size of neural networks so that their inference latencies are low enough to handle real-time inputs [3,4,5,6,7,8]. There are numerous methods to reduce the size of neural networks for different platforms, among which are CPUs, GPUs, and FPGAs.…”

Section: Introductionmentioning

confidence: 99%