2018
DOI: 10.48550/arxiv.1801.06313
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
4
3

Relationship

4
3

Authors

Journals

citations
Cited by 14 publications
(27 citation statements)
references
References 24 publications
0
27
0
Order By: Relevance
“…The DNN representation of linear finite element functions opens a door for theoretical explanation and possible improvement on the application of the quantized weights in a convolution neural networks (see [17]).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The DNN representation of linear finite element functions opens a door for theoretical explanation and possible improvement on the application of the quantized weights in a convolution neural networks (see [17]).…”
Section: Discussionmentioning
confidence: 99%
“…In this section, we will show the rationality of low bit-width models with respect to approximation properties in some sense by investigating that a special type of ReLU DNN model can also recover all CPWL functions. In [17], an incremental network quantization strategy is proposed for transforming a general trained CNN into some low bit-width version in which there parameters are all zeros or powers of two. Mathematically speaking, low bit-width DNN model is defined as:…”
Section: Low Bit-width Dnn Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…To this end, we follow (Cai et al, 2017) and resort to a modified batch normalization layer (Ioffe & Szegedy, 2015) without the scale and shift, whose output components approximately follow a unit Gaussian distribution. Then the α that fits the input of activation layer the best can be pre-computed by a variant of Lloyd's algorithm (Lloyd, 1982;Yin et al, 2018a) applied to a set of simulated 1-D half-Gaussian data. After determining the α, it will be fixed during the whole training process.…”
Section: Methodsmentioning
confidence: 99%
“…It calls for minimizing a piecewise constant and highly nonconvex empirical risk function f (w) subject to a discrete set-constraint w ∈ Q that characterizes the quantized weights. In particular, weight quantization of DNN have been extensively studied in the literature; see for examples (Li et al, 2016;Zhu et al, 2016;Yin et al, 2016;2018a;Hou & Kwok, 2018;He et al, 2018;Li & Hao, 2018). On the other hand, the gradient ∇f (w) in training activation quantized DNN is almost everywhere (a.e.)…”
Section: Introductionmentioning
confidence: 99%