2020
DOI: 10.48550/arxiv.2005.10785
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

Abstract: In this paper, we propose a new accelerated stochastic first-order method called clipped-SSTM for smooth convex stochastic optimization with heavy-tailed distributed noise in stochastic gradients and derive the first high-probability complexity bounds for this method closing the gap in the theory of stochastic optimization with heavy-tailed noise. Our method is based on a special variant of accelerated Stochastic Gradient Descent (SGD) and clipping of stochastic gradients. We extend our method to the strongly … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(32 citation statements)
references
References 37 publications
0
32
0
Order By: Relevance
“…The full details of the proof can be found in Section E in Appendix. Our proof is improved upon previous analysis of clipped-SGD with constant learning rate and O(n) batch size [21] and high probability bounds for SGD with O(1/t) step size in the sub-Gaussian setting [25]. Our analysis consists of three steps: (i) Expansion of the update rule.…”
Section: Heavy-tailed Linear Regressionmentioning
confidence: 76%
See 4 more Smart Citations
“…The full details of the proof can be found in Section E in Appendix. Our proof is improved upon previous analysis of clipped-SGD with constant learning rate and O(n) batch size [21] and high probability bounds for SGD with O(1/t) step size in the sub-Gaussian setting [25]. Our analysis consists of three steps: (i) Expansion of the update rule.…”
Section: Heavy-tailed Linear Regressionmentioning
confidence: 76%
“…Prasad et al [42] utilized the geometric median-ofmeans to robustly estimate gradients in each mini-batch. Gorbunov et al [21] and Nazin et al [39] proposed clipped-SSTM and RSMD respectively based on truncation of stochastic gradients for stochastic mirror/gradient descent. Zhang et al [51] analyzed the convergence of clipped-SGD in expectation but focus on a different noise regime where the distribution of stochastic gradients has bounded 1 + α moments for some 0 < α ≤ 1.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations