Backtracking Gradient Descent Method and Some Applications in Large Scale Optimisation. Part 2: Algorithms and Experiments

Truong, Tuyen Trung; Nguyen, Hang-Tuan

doi:10.1007/s00245-020-09718-8

Cited by 21 publications

(35 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In case f has compact sublevels, then this is easily proven [9]. For the general case, see [10] for a proof.…”

Section: Convergence Resultsmentioning

confidence: 99%

“…There are many popular modifications trying to overcome this, such as Adam, Adadelta, Nesterov Accelerated Gradient, Momentum and so on (see [14] for a review), none of these are guaranteed to converge in general either. To date, only Backtracking GD is guaranteed to converge: see Chapter 12 in [9], in particular Proposition 12.6.1 there, for the case f ∈ C 1,1 L and has compact sublevels and has at most countably many critical points, see [8] when f is real analytic (or more generally satisfies the so-called Losjasiewicz gradient inequality), and see [10] for the general case of f being C 1 only and has at most countably many critical points. Note that the assumption in the last paper is not too restrictive: indeed, it is known from transversality results that such an assumption is satisfied by a generic C 1 function (for example, by Morse's functions, which are a well-known class of functions in geometry and analysis).…”

Section: Convergence Resultsmentioning

confidence: 99%

“…Both [9,10] start from the following property: If {x n } is constructed as above, and {x n j } is a convergent subsequence, then lim j→∞ ∇ f (x n j ) = 0. This is classically known (see [15]), the main idea is as follows: if lim j→∞ x n j = x ∞ and ∇ f (x ∞ ) = 0, then lim inf j→∞ δ n j > 0.…”

Section: Convergence Resultsmentioning

confidence: 99%

“…In the general case, the above proof does not go through, since the set X in the above may not be bounded. In [10], a way to go around is as follows. We let (P k , d) be the real projective space with its canonical metric (the spherical metric).…”

Section: Convergence Resultsmentioning

confidence: 99%

“…Then, even though it dates back more than 170 years [4], only gradually (with some results announced only very recently) it has been shown that GD has good properties: it can avoid saddle points [5,6]. While its standard version does not guarantee convergence to critical points, its Backtracking version [7] does [8][9][10] (the latter paper consists of the more experimental part of arXiv:1808.05160, in combination with arXiv:2001.02005 and arXiv:2007.03618) and can be implemented in deep neural networks with very good performance on CIFAR10 and CIFAR100 image datasets [10,11]. Some further modifications of Backtracking GD can avoid saddle points as well [12,13].…”

Section: Motivationmentioning

confidence: 99%

See 4 more Smart Citations

When Will a Sequence of Points in a Riemannian Submanifold Converge?

Truong

2020

Mathematics

Self Cite

View full text Add to dashboard Cite

Let X be a Riemannian manifold and xn a sequence of points in X. Assume that we know a priori some properties of the set A of cluster points of xn. The question is under what conditions that xn will converge. An answer to this question serves to understand the convergence behaviour for iterative algorithms for (constrained) optimisation problems, with many applications such as in Deep Learning. We will explore this question, and show by some examples that having X a submanifold (more generally, a metric subspace) of a good Riemannian manifold (even in infinite dimensions) can greatly help.

show abstract

“…In case f has compact sublevels, then this is easily proven [9]. For the general case, see [10] for a proof.…”

Section: Convergence Resultsmentioning

confidence: 99%

Section: Convergence Resultsmentioning

confidence: 99%