A novel quasi-Newton-based optimization for neural network training incorporating Nesterov's accelerated gradient

Journal of Signal Processing

Ninomiya

et al. 2020

Self Cite

In this paper, we describe a robust technique based on the quasi-Newton method (QN) using an adaptive momentum term to train of neural networks. Microwave circuit models have strong nonlinearities and need a robust training algorithm for their neural network models. The robustness here means that practical solutions can be obtained regardless of the initial values. QN-based algorithms are commonly used for these purposes. Nesterov's accelerated quasi-Newton method (NAQ) proposed a way to accelerate of the QN using a fixed momentum coefficient. In this research, we verify the effectiveness of NAQ for microwave circuit modeling with high nonlinearities and propose a robust QNbased training algorithm with an adaptive momentum coefficient. The proposed algorithm is demonstrated through the modeling of a function and two microwave circuit modeling problems.Here, d p , o p , and w ∈ R n are the pth desired, pth output and weight vectors, respectively. T r denotes the training data set

Section: Formulation Of Neural Network Trainingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Robust Quasi-Newton Training with Adaptive Momentum for Microwave Circuit Models in Neural Networks

Journal of Signal Processing

Ninomiya

et al. 2020

Self Cite

“…Several modifications have been proposed to quasi-Newton to obtain stronger convergence. The Nesterov's Accelerated quasi-Newton method [15] gives faster convergence compared to the standard quasi-Newton methods. NAQ obtains faster convergence by quadratic approximation at w k + µv k and by incorporating the Nesterov's accelerated gradient ∇E(w k + µv k ) The derivation of NAQ is briefly introduced as follows:…”

Section: )mentioning

confidence: 99%

Implementation of a Modified Nesterov's Accelerated Quasi-Newton Method on Tensorflow

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)

Ninomiya

et al. 2018

Self Cite

Recent studies incorporate Nesterov's accelerated gradient method for the acceleration of gradient based training. The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to drastically improve the convergence speed compared to the conventional quasi-Newton method. This paper implements NAQ for non-convex optimization on Tensorflow. Two modifications have been proposed to the original NAQ algorithm to ensure global convergence and eliminate linesearch. The performance of the proposed algorithm -mNAQ is evaluated on standard non-convex function approximation benchmark problems and microwave circuit modelling problems. The results show that the improved algorithm converges better and faster compared to first order optimizers such as AdaGrad, RMSProp, Adam, and the second order methods such as the quasi-Newton method.

“…The NAQ [16] method achieves faster convergence compared to the standard QN by quadratic approximation of the objective function at w k + µv k and by incorporating the Nesterov's accelerated gradient ∇E(w k + µv k ) in its Hessian update. The update vector of NAQ is given as…”

mentioning

confidence: 99%

A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

Yasuda

2019 18th IEEE International Conference on Machine Learning and Applications (ICMLA)

et al. 2019

Self Cite

Recently algorithms incorporating second order curvature information have become popular in training neural networks. The Nesterov's Accelerated Quasi-Newton (NAQ) method has shown to effectively accelerate the BFGS quasi-Newton method by incorporating the momentum term and Nesterov's accelerated gradient vector. A stochastic version of NAQ method was proposed for training of large-scale problems. However, this method incurs high stochastic variance noise. This paper proposes a stochastic variance reduced Nesterov's Accelerated Quasi-Newton method in full (SVR-NAQ) and limited (SVR-LNAQ) memory forms. The performance of the proposed method is evaluated in Tensorflow on four benchmark problems -two regression and two classification problems respectively. The results show improved performance compared to conventional methods.