2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8461456
|View full text |Cite
|
Sign up to set email alerts
|

True Gradient-Based Training of Deep Binary Activated Neural Networks Via Continuous Binarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(15 citation statements)
references
References 4 publications
0
15
0
Order By: Relevance
“…On the other hand, because of the nondifferentiable quantizer, some literature focuses on relaxing the discrete optimization problem. A typical approach is to train with regularization [13,49,2,1,33,8], where the optimization problem becomes continuous while gradually adjusting the data distribution towards quantized values. Apart from the two challenges, with the popularization of neural architecture search (NAS), Wang et al [38] further propose to employ reinforcement learning to automatically determine the bit-width of each layer without human heuristics.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, because of the nondifferentiable quantizer, some literature focuses on relaxing the discrete optimization problem. A typical approach is to train with regularization [13,49,2,1,33,8], where the optimization problem becomes continuous while gradually adjusting the data distribution towards quantized values. Apart from the two challenges, with the popularization of neural architecture search (NAS), Wang et al [38] further propose to employ reinforcement learning to automatically determine the bit-width of each layer without human heuristics.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, [9,15,28] set the foundations for 1-bit quantization, while [16,50] for arbitrary bitwidth quantization. Progressive quantization [2,1,53,33], loss aware-quantization [13,49], improved gradient estimators for non-differentiable functions [21] and RL-aided training [20], have focused on improved training schemes, while mixed precision quantization [36], hardware-aware quantization [37] and architecture search for quantized models [34] have focused on alternatives for standard quantized models. However, these strategies are exclusively focused on improving the performance and efficiency of static networks.…”
Section: Introductionmentioning
confidence: 99%
“…Continuous binarization Few recent work proposed to use a continuous activation function that increasingly resembles a binary activation function during training, thereby eliminating approximation process across activation function. Sakr et al (2018) used a piecewise linear function of which the slope gradually increases, while and Gong et al (2019) proposed to use sigmoid and tanh function, respectively.…”
Section: Sophisticated Stesmentioning
confidence: 99%
“…Reducing DNN complexity via quantization has been an active area of research over the past few years. A majority of such works either train the quantized network from scratch [36,33,16,10,22,24] or fine-tune a pre-trained model with quantization-in-the-loop [13,19,30,32,1,35]. Where retraining is not an option, [25] provides analytical guarantees on the minimum precision requirements of a pre-trained FP network given a budget on the accuracy drop from FP.…”
Section: Related Workmentioning
confidence: 99%
“…Where retraining is not an option, [25] provides analytical guarantees on the minimum precision requirements of a pre-trained FP network given a budget on the accuracy drop from FP. Training based quantization works fall into two classes of methods: 1) estimation based methods [33,18,16,30,13], where the full-precision weights and activations are quantized in the forward path, and gradients are back-propagated through a nondifferentiable quantizer function via a gradient estimator such as the Straight Through Estimator (STE) [2]; and 2) optimization based methods, where gradients flow directly from the full-precision weights to the cost function via an approximate differentiable quantizer [32,19,24], or by including an explicit quantization error term to the loss function [7,35]. Application of these methods can be categorized into three clusters: Aggressive Quantization: Methods such as binarization and ternarization have been highly successful for reducing DNN complexity.…”
Section: Related Workmentioning
confidence: 99%