Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach

Khan, Fahim Faysal; Kamani, Mohammad Mahdi; Mahdavi, Mehrdad; Narayanan, Vijaykrishnan

doi:10.1109/dac18072.2020.9218576

Cited by 10 publications

(10 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We chose an SGD optimizer with an initial learning rate of 0.1 for all datasets. We reduce the learning rate by a factor of 0.1 at [100, 150] epochs for CIFAR-10, [30,60,90] for both ImageNet datasets. For CIFAR-100, we reduce the learning rate by 0.2 at [60,120,160] epochs.…”

Section: Resultsmentioning

confidence: 99%

“…However, the benefits of these achievements are limited in resourceconstrained systems such as mobile devices, low power robots, etc. Different model compression algorithms [4,17,18,23,25,30,31] have been proposed to reduce the complexity of such larger models for these systems. Among different compression algorithms, distilling the knowledge from a larger model to a smaller one has shown to be 1.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Chennupati¹,

Kamani²,

Cheng³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Knowledge Distillation is becoming one of the primary trends among neural network compression algorithms to improve the generalization performance of a smaller student model with guidance from a larger teacher model. This momentous rise in applications of knowledge distillation is accompanied by the introduction of numerous algorithms for distilling the knowledge such as soft targets and hint layers. Despite this advancement in different techniques for distilling the knowledge, the aggregation of different paths for distillation has not been studied comprehensively. This is of particular significance, not only because different paths have different importance, but also due to the fact that some paths might have negative effects on the generalization performance of the student model. Hence, we need to adaptively adjust the importance of each path to maximize the impact of distillation on the student model. In this paper, we explore different approaches for aggregating these different paths and introduce our proposed adaptive approach based on multitask learning methods. We empirically demonstrate the effectiveness of the proposed approach over other baselines on the applications of knowledge distillation in classification, semantic segmentation, and object detection tasks.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Chennupati¹,

Kamani²,

Cheng³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [49], [56], deciding the appropriate bit-precision level is manual and laborious so as to maintain accuracy. Thereafter, automated algorithms are designed that can discover the appropriate quantization level for each datastructure with accuracy in mind [57], [58].…”

Section: Bit-precision Multiply Accumulatementioning

confidence: 99%

“…CCQ [58] performs stages of competition and collaboration to gradually adapt weight's wordlength. The competition stage is carried out to measure the effect of quantizing randomly chosen layers to next bit-precision level on accuracy and memory.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

“…As a summary of the literature, improving the accuracy of quantized DNNs comes at the expense of floating-point computational cost in [30], [32], [34], [35], [38], [42], [45], [56]- [58], [61], [63]- [67], [69], [74], [76], [78]- [80], [82]- [84], [86]- [88]. Specifically, these approaches scale output activations of each layer with FP32 coefficient(s) to recover the dynamic range, and/or perform batch normalization as well as the operations of first and last layers with FP32 datastructures.…”

Section: Mixed-precision Quantizationmentioning

confidence: 99%

See 1 more Smart Citation

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

2022

View full text Add to dashboard Cite

Deep neural networks (DNNs) have demonstrated their effectiveness in a wide range of computer vision tasks, with the state-of-the-art results obtained through complex and deep structures that require intensive computation and memory. In the past, graphic processing units enabled these breakthroughs because of their greater computational speed. Now-a-days, efficient model inference is crucial for consumer applications on resource-constrained platforms. As a result, there is much interest in the research and development of dedicated deep learning (DL) hardware to improve the throughput and energy efficiency of DNNs. Low-precision representation of DNN data-structures through quantization would bring great benefits to specialized DL hardware especially when expensive floating-point operations can be avoided and replaced by more efficient fixed-point operations. However, the rigorous quantization leads to a severe accuracy drop. As such, quantization opens a large hyper-parameter space at bit-precision levels, the exploration of which is a major challenge. In this paper, we propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet) that flexibly designs a mixed low-precision DNN for integer-arithmetic-only deployment. Specifically, the FxP-QNet gradually adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements. Additionally, it employs post-training self-distillation and network prediction error statistics to optimize the quantization of floating-point values into fixed-point numbers. Examining FxP-QNet 1 on state-of-the-art architectures and the benchmark ImageNet dataset, we empirically demonstrate the effectiveness of FxP-QNet in achieving the accuracy-compression trade-off without the need for training. The results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16ˆ, 10.36ˆ, and 6.44ˆwith less than 0.95%, 0.95%, and 1.99% accuracy drop, respectively.

show abstract

Mixed Precision Quantization Scheme for Re-configurable ReRAM Crossbars Targeting Different Energy Harvesting Scenarios

Khan

Jao

Shuai

et al. 2020

IFIP Advances in Information and Communication Technology

Self Cite

View full text Add to dashboard Cite

Crossbar arrays with non-volatile memory have recently become very popular for DNN acceleration due to their In-Memory-Computing property and low power requirements which makes them suitable for deployment on edge. Quantized neural network (QNNs) enables us to run inference with limited hardware resource and power availability and can easily be ported on smaller devices. On the other hand, to make edge devices self sustainable a great deal of promise has been shown by energy harvesting scenarios. However, the power supplied by the energy harvesting sources is not constant which becomes problematic as a fixed trained neural network requires a constant amount of power to run inference. This work addresses this issue by tuning network precision at layer granularity for variable power availability predicted for different energy harvesting scenarios.

show abstract

Learning to Quantize Deep Neural Networks: A Competitive-Collaborative Approach

Cited by 10 publications

References 17 publications

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

FxP-QNet: A Post-Training Quantizer for the Design of Mixed Low-Precision DNNs With Dynamic Fixed-Point Representation

Mixed Precision Quantization Scheme for Re-configurable ReRAM Crossbars Targeting Different Energy Harvesting Scenarios

Contact Info

Product

Resources

About