Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 2017
DOI: 10.1145/3055399.3055448
|View full text |Cite
|
Sign up to set email alerts
|

Katyusha: the first direct acceleration of stochastic gradient methods

Abstract: Nesterov's momentum trick is famously known for accelerating gradient descent, and has been proven useful in building fast iterative algorithms. However, in the stochastic setting, counterexamples exist and prevent Nesterov's momentum from providing similar acceleration, even if the underlying problem is convex.We introduce Katyusha, a direct, primal-only stochastic gradient method to fix this issue. It has a provably accelerated convergence rate in convex (off-line) stochastic optimization. The main ingredien… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

14
707
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 268 publications
(723 citation statements)
references
References 21 publications
14
707
0
2
Order By: Relevance
“…This results in an improved computational complexity of O (n + κ) log 1 ǫ passes over the data set to achieve an ǫ-optimal solution in expectation. When these methods are combined with Nesterov acceleration, the expected complexity becomes O ((n + √ nκ) log(1/ǫ)) (see, e.g., [7] and [1]).…”
Section: Stochastic Optimization Of Least Squaresmentioning
confidence: 99%
See 1 more Smart Citation
“…This results in an improved computational complexity of O (n + κ) log 1 ǫ passes over the data set to achieve an ǫ-optimal solution in expectation. When these methods are combined with Nesterov acceleration, the expected complexity becomes O ((n + √ nκ) log(1/ǫ)) (see, e.g., [7] and [1]).…”
Section: Stochastic Optimization Of Least Squaresmentioning
confidence: 99%
“…In this section, we introduce NAPI, an noisy accelerated power method for solving (1). We then characterize the convergence rate of the proposed algorithm for the special case of computing the leading generalized eigenvector.…”
Section: Computing the Leading Generalized Eigenvectormentioning
confidence: 99%
“…[14] Recently, Allen-Zhu provides a new momentum named Katyusha momentum and achieves great performance in many settings. [3] An illustration of the difference among SGD, Momentum & Nesterov's momentum is Fig. 3: Fig.…”
Section: The Variants Of Sgdmentioning
confidence: 99%
“…where η > 0 is the step size. Dual averaging (DA, [16]) algorithm is another widely used algorithm for solving (1), which iterates as…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, we develop a new dual-averaging primal-dual (DAPD) method for solving (1), which has accelerated optimal convergence rate. When f (Ax) has a finitesum structure, we develop a stochastic version of DAPD, named SDAPD, which is also optimal, and has better overall complexity on sparse data comparing with existing algorithms of the same type.…”
Section: Introductionmentioning
confidence: 99%