“…There are many popular modifications trying to overcome this, such as Adam, Adadelta, Nesterov Accelerated Gradient, Momentum and so on (see [14] for a review), none of these are guaranteed to converge in general either. To date, only Backtracking GD is guaranteed to converge: see Chapter 12 in [9], in particular Proposition 12.6.1 there, for the case f ∈ C 1,1 L and has compact sublevels and has at most countably many critical points, see [8] when f is real analytic (or more generally satisfies the so-called Losjasiewicz gradient inequality), and see [10] for the general case of f being C 1 only and has at most countably many critical points. Note that the assumption in the last paper is not too restrictive: indeed, it is known from transversality results that such an assumption is satisfied by a generic C 1 function (for example, by Morse's functions, which are a well-known class of functions in geometry and analysis).…”