2021
DOI: 10.1587/nolta.12.554
|View full text |Cite
|
Sign up to set email alerts
|

Momentum acceleration of quasi-Newton based optimization technique for neural network training

Abstract: This paper describes a momentum acceleration technique for quasi-Newton (QN) based neural network training and verifies its performance and computational complexity. Recently, Nesterov's accelerated quasi-Newton method (NAQ) has been introduced and shown that the momentum term is effective in reducing the number of iterations and the total training time by incorporating Nesterov's accelerated gradient into QN. However, the gradients had to be calculated two times in one iteration in the NAQ training. This incr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…Also note that the proposed L-SR1-N has two gradient computations per iteration. The Nesterov's gradient ∇E(w k + µ k v k ) can be approximated [25,29] as a linear combination of past gradients as shown below.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Also note that the proposed L-SR1-N has two gradient computations per iteration. The Nesterov's gradient ∇E(w k + µ k v k ) can be approximated [25,29] as a linear combination of past gradients as shown below.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…Recent works such as [22,23] have proposed sampled LSR1 (limited memory) quasi-Newton updates for machine learning and describe efficient ways for distributed training implementation. Recent studies such as [24,25] have shown that the BFGS method can be accelerated by using Nesterov's accelerated gradient and momentum terms. In this paper, we explore if the Nesterov's acceleration can be applied to the LSR1 quasi-Newton method as well.…”
Section: Related Workmentioning
confidence: 99%
“…Often, these algorithms reach, yet do not leave, local minima points. Their convergence decreases heavily when they reach certain local optima, although momentum techniques are very common [14]. In this gradient descent algorithm, explicit mathematical expressions are needed in order to obtain the gradient components.…”
Section: Introductionmentioning
confidence: 99%
“…(8) where µ is the momentum parameter and ∇f (θ k + µv k ) is the Nesterov's accelerated gradient. MoQ (Mahboubi et al 2021) approximated ∇f (θ k +µv k ) in NAQ as a linear combination of past gradients. The acceleration of second order methods pave promising scope to numerous applications and is the focus of this research.…”
Section: Introductionmentioning
confidence: 99%