Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 2017
DOI: 10.1145/3055399.3055464
|View full text |Cite
|
Sign up to set email alerts
|

Finding approximate local minima faster than gradient descent

Abstract: We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples. The time complexity of our algorithm to find an approximate local minimum is even faster than that of gradient descent to find a critical point. Our algorithm applies to a general class of optimization problems including training a neural network and other non-convex objectives arising in machine learn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

8
314
1
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 149 publications
(325 citation statements)
references
References 19 publications
8
314
1
2
Order By: Relevance
“…Идея за-ключается в том, чтобы формировать матрицу Гессе оптимизируемой функции исходя из матриц Гессе относительно небольшого числа случайно выбранных слагаемых [Ghadimi et al, 2017]. Другая идея заключается в отказе от обращения матрицы Гессе на итерации, вместо этого пред-лагается использовать информацию о собственном векторе, отвечающем наименьшему собствен-ному значению [Agarwal et al, 2017;Carmon et al, 2017]. Для приближенного вычисления тако-го вектора вполне достаточно уметь умножать матрицу Гессе на произвольный вектор:…”
Section: Discussionunclassified
“…Идея за-ключается в том, чтобы формировать матрицу Гессе оптимизируемой функции исходя из матриц Гессе относительно небольшого числа случайно выбранных слагаемых [Ghadimi et al, 2017]. Другая идея заключается в отказе от обращения матрицы Гессе на итерации, вместо этого пред-лагается использовать информацию о собственном векторе, отвечающем наименьшему собствен-ному значению [Agarwal et al, 2017;Carmon et al, 2017]. Для приближенного вычисления тако-го вектора вполне достаточно уметь умножать матрицу Гессе на произвольный вектор:…”
Section: Discussionunclassified
“…These approaches yield a worst case operational complexity of O(nǫ −3/2 g ) when ǫ H = ǫ 1/2 g . Two independently proposed algorithms, respectively based on adapting accelerated gradient to the nonconvex setting [11] and approximately solving the cubic regularization subproblem [1], requireÕ(ǫ −7/4 g ) operations (with high probability, showing dependency only on ǫ g ) to find a point x that satisfies (7) when ǫ H = ǫ 1/2 g . The difference of a factor of ǫ −1/4 g with the iteration complexity bounds arises from the cost of computing a negative curvature direction of ∇ 2 f (x k ) and/or the cost of solving a linear system.…”
Section: Related Workmentioning
confidence: 99%
“…1 Introduction We consider the following constrained optimization problem: (1) min f (x) subject to x ≥ 0, where f : R n → R is a nonconvex function, twice uniformly Lipschitz continuously differentiable in the interior of the nonnegative orthant. We assume that explicit storage of the Hessian ∇ 2 f (x) for x > 0 is undesirable, but that Hessian-vector products of the form ∇ 2 f (x)v can be computed at any x > 0 for arbitrary vectors v. Computational differentiation techniques [29] can be used to evaluate such products at a cost that is a small multiple of the cost of evaluation of the gradient ∇f .…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Several works study some special non-convex objective functions and find SGD or its variants can be convergence. Besides, some researchers [2] find that in many machine learning problems, the minimal value of local minimum is a good approximation for the global minimum. Moreover, it is not difficult to obtain a local minimum since the quantity of local minimum is significant.…”
Section: Distributed and Non-convex Extensionmentioning
confidence: 99%