Momentum acceleration of quasi-Newton based optimization technique for neural network training

Mahboubi, Shahrzad; Indrapriyadarsini, S.; Ninomiya, Hiroshi; Asai, Hideki

doi:10.1587/nolta.12.554

Cited by 5 publications

(4 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Also note that the proposed L-SR1-N has two gradient computations per iteration. The Nesterov's gradient ∇E(w k + µ k v k ) can be approximated [25,29] as a linear combination of past gradients as shown below.…”

Section: Proposed Methodsmentioning

confidence: 99%

“…Recent works such as [22,23] have proposed sampled LSR1 (limited memory) quasi-Newton updates for machine learning and describe efficient ways for distributed training implementation. Recent studies such as [24,25] have shown that the BFGS method can be accelerated by using Nesterov's accelerated gradient and momentum terms. In this paper, we explore if the Nesterov's acceleration can be applied to the LSR1 quasi-Newton method as well.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

et al. 2021

Self Cite

View full text Add to dashboard Cite

Gradient-based methods are popularly used in training neural networks and can be broadly categorized into first and second order methods. Second order methods have shown to have better convergence compared to first order methods, especially in solving highly nonlinear problems. The BFGS quasi-Newton method is the most commonly studied second order method for neural network training. Recent methods have been shown to speed up the convergence of the BFGS method using the Nesterov’s acclerated gradient and momentum terms. The SR1 quasi-Newton method, though less commonly used in training neural networks, is known to have interesting properties and provide good Hessian approximations when used with a trust-region approach. Thus, this paper aims to investigate accelerating the Symmetric Rank-1 (SR1) quasi-Newton method with the Nesterov’s gradient for training neural networks, and to briefly discuss its convergence. The performance of the proposed method is evaluated on a function approximation and image classification problem.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Accelerating Symmetric Rank-1 Quasi-Newton Method with Nesterov’s Gradient for Training Neural Networks

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Often, these algorithms reach, yet do not leave, local minima points. Their convergence decreases heavily when they reach certain local optima, although momentum techniques are very common [14]. In this gradient descent algorithm, explicit mathematical expressions are needed in order to obtain the gradient components.…”

Section: Introductionmentioning

confidence: 99%

Battery Sizing Optimization in Power Smoothing Applications

et al. 2022

View full text Add to dashboard Cite

The main objective of this work was to determine the worth of installing an electrical battery in order to reduce peak power consumption. The importance of this question resides in the expensive terms of energy bills when using the maximum power level. If maximum power consumption decreases, it affects not only the revenues of maximum power level bills, but also results in important reductions at the source of the power. This way, the power of the transformer decreases, and other electrical elements can be removed from electrical installations. The authors studied the Spanish electrical system, and a particle swarm optimization (PSO) algorithm was used to model battery sizing in peak power smoothing applications for an electrical consumption point. This study proves that, despite not being entirely profitable at present due to current kWh prices, implanting a battery will definitely be an option to consider in the future when these prices come down.

show abstract

“…(8) where µ is the momentum parameter and ∇f (θ k + µv k ) is the Nesterov's accelerated gradient. MoQ (Mahboubi et al 2021) approximated ∇f (θ k +µv k ) in NAQ as a linear combination of past gradients. The acceleration of second order methods pave promising scope to numerous applications and is the focus of this research.…”

Section: Introductionmentioning

confidence: 99%