Jiancheng Lyu scite author profile

We propose BinaryRelax, a simple two-phase algorithm, for training deep neural networks with quantized weights. The set constraint that characterizes the quantization of weights is not imposed until the late stage of training, and a sequence of pseudo quantized weights is maintained. Specifically, we relax the hard constraint into a continuous regularizer via Moreau envelope, which turns out to be the squared Euclidean distance to the set of quantized weights. The pseudo quantized weights are obtained by linearly interpolating between the float weights and their quantizations. A continuation strategy is adopted to push the weights towards the quantized state by gradually increasing the regularization parameter. In the second phase, exact quantization scheme with a small learning rate is invoked to guarantee fully quantized weights. We test BinaryRelax on the benchmark CIFAR and ImageNet color image datasets to demonstrate the superiority of the relaxed quantization approach and the improved accuracy over the state-of-the-art training methods. Finally, we prove the convergence of BinaryRelax under an approximate orthogonality condition.

show abstract

Blended coarse gradient descent for full quantization of deep neural networks

Yin

Zhang

Lyu

et al. 2019

Res Math Sci

View full text Add to dashboard Cite

Quantized deep neural networks (QDNNs) are attractive due to their much lower memory storage and faster inference speed than their regular full precision counterparts. To maintain the same performance level especially at low bit-widths, QDNNs must be retrained. Their training involves piecewise constant activation functions and discrete weights, hence mathematical challenges arise. We introduce the notion of coarse gradient and propose the blended coarse gradient descent (BCGD) algorithm, for training fully quantized neural networks. Coarse gradient is generally not a gradient of any function but an artificial ascent direction. The weight update of BCGD goes by coarse gradient correction of a weighted average of the full precision weights and their quantization (the so-called blending), which yields sufficient descent in the objective value and thus accelerates the training. Our experiments demonstrate that this simple blending technique is very effective for quantization at extremely low bit-width such as binarization. In full quantization of ResNet-18 for ImageNet classification task, BCGD gives 64.36% top-1 accuracy with binary weights across all layers and 4-bit adaptive activation. If the weights in the first and last layers are kept in full precision, this number increases to 65.46%. As theoretical justification, we show convergence analysis of coarse gradient descent for a two-linear-layer neural network model with Gaussian input data, and prove that the expected coarse gradient correlates positively with the underlying true gradient.Keywords weight/activation quantization · blended coarse gradient descent · sufficient descent property · deep neural networks Mathematics Subject Classification (2010) 90C35, 90C26, 90C52, 90C90.

show abstract

BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

Yin¹,

Zhang²,

Lyu³

et al. 2018

Preprint

View full text Add to dashboard Cite

Computing Residual Diffusivity by Adaptive Basis Learning via Spectral Method

Lyu

Xin

2017

Numer. Math. Theory Methods Appl.

View full text Add to dashboard Cite

We study the residual diffusion phenomenon in chaotic advection computationally via adaptive orthogonal basis. The chaotic advection is generated by a class of time periodic cellular flows arising in modeling transition to turbulence in Rayleigh-Bénard experiments. The residual diffusion refers to the non-zero effective (homogenized) diffusion in the limit of zero molecular diffusion as a result of chaotic mixing of the streamlines. In this limit, the solutions of the advection-diffusion equation develop sharp gradients, and demand a large number of Fourier modes to resolve, rendering computation expensive. We construct adaptive orthogonal basis (training) with built-in sharp gradient structures from fully resolved spectral solutions at few sampled molecular diffusivities. This is done by taking snapshots of solutions in time, and performing singular value decomposition of the matrix consisting of these snapshots as column vectors. The singular values decay rapidly and allow us to extract a small percentage of left singular vectors corresponding to the top singular values as adaptive basis vectors. The trained orthogonal adaptive basis makes possible low cost computation of the effective diffusivities at smaller molecular diffusivities (testing). The testing errors decrease as the training occurs at smaller molecular diffusivities. We make use of the Poincaré map of the advection-diffusion equation to bypass long time simulation and gain accuracy in computing effective diffusivity and learning adaptive basis. We observe a non-monotone relationship between residual diffusivity and the amount of chaos in the advection, though the overall trend is that sufficient chaos leads to higher residual diffusivity.

show abstract

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

Yin

Zhang

Lyu

et al. 2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jiancheng Lyu

BinaryRelax: A Relaxation Approach for Training Deep Neural Networks with Quantized Weights

Blended coarse gradient descent for full quantization of deep neural networks

BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights

Computing Residual Diffusivity by Adaptive Basis Learning via Spectral Method

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

Contact Info

Product

Resources

About