2017
DOI: 10.3906/mat-1411-51
|View full text |Cite
|
Sign up to set email alerts
|

Near optimal step size and momentum in gradient descent for quadratic functions

Abstract: Many problems in statistical estimation, classification, and regression can be cast as optimization problems.Gradient descent, which is one of the simplest and easy to implement multivariate optimization techniques, lies at the heart of many powerful classes of optimization methods. However, its major disadvantage is the slower rate of convergence with respect to the other more sophisticated algorithms. In order to improve the convergence speed of gradient descent, we simultaneously determine near-optimal scal… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 10 publications
0
1
0
2
Order By: Relevance
“…Powell is mostly used for a local search of optimal values, which has the features of simple calculation, fast convergence, and high accuracy, but it is easy to fall into the local optimal point when searching for the registration parameters, resulting in an insufficient accuracy of the registration. GD can reduce the iterative optimization time to iterate in the direction of the fastest change of the objective function value ( 31 ), which can improve the speed of the algorithm, but it does not guarantee to find the global optimal solution that meets the requirements because of the limitation of each iteration step ( 32 34 ).…”
Section: Related Workmentioning
confidence: 99%
“…Powell is mostly used for a local search of optimal values, which has the features of simple calculation, fast convergence, and high accuracy, but it is easy to fall into the local optimal point when searching for the registration parameters, resulting in an insufficient accuracy of the registration. GD can reduce the iterative optimization time to iterate in the direction of the fastest change of the objective function value ( 31 ), which can improve the speed of the algorithm, but it does not guarantee to find the global optimal solution that meets the requirements because of the limitation of each iteration step ( 32 34 ).…”
Section: Related Workmentioning
confidence: 99%
“…Bunun sonucu olarak oluşan algoritma aşağıda verilmiştir (Algoritma 2). Yapılan deneylerde en büyük ve en küçük özdeğerin yaklaşık tahminleriyle hesaplanan öğrenme oranı ve momentum katsayısının bu parametrelerin gerçek değerlerine çok yakın olduğu görülmüştür[14]. rankı gerekli olan iterasyon sayısı için bir üst sınır verir.…”
unclassified
“…Taş ve Memmedli [14], gradyan düşümü algoritmasının hızlı bir versiyonunu (HGD) geliştirmiştir (Algoritma 1). HGD'nin temeli, kuadratik hata fonksiyonunun Hessian matrisinin en büyük ve en küçük özdeğerinden hesaplanan optimala yakın bir öğrenme oranı ( ) ve momentum katsayısına ( ) dayanır.…”
unclassified