2021
DOI: 10.48550/arxiv.2106.05958
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Abstract: Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smoo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
21
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(21 citation statements)
references
References 16 publications
0
21
0
Order By: Relevance
“…This rate has a worse dependence on k than our best scheme, but has an improved dependence on δ. However, we stress that the setting of [7,8] is different from ours. Indeed in [13], which is more closely related to our proposal, the corresponding result contains also the term log(1/δ) (see Theorem 2).…”
Section: Sgdmentioning
confidence: 88%
See 3 more Smart Citations
“…This rate has a worse dependence on k than our best scheme, but has an improved dependence on δ. However, we stress that the setting of [7,8] is different from ours. Indeed in [13], which is more closely related to our proposal, the corresponding result contains also the term log(1/δ) (see Theorem 2).…”
Section: Sgdmentioning
confidence: 88%
“…Despite obtaining near-optimal rates, both works suffer from either unpractical parameter settings or unrealistic assumptions. Moreover, differently from most results obtained in the light tailed case, in [13,8] the analysis is confined to a finite horizon, which is a limitation in many practical scenarios. Indeed, finite horizon methods cannot cope with online settings in which data arrives continuously in a potentially infinite stream of batches and the predictive model is updated accordingly.…”
Section: Introductionmentioning
confidence: 95%
See 2 more Smart Citations
“…Using Markov inequality, one can easily derive high-probability bounds with non-desirable polynomial dependence on 1 /β, e.g., see the discussion in[Davis et al, 2021, Gorbunov et al, 2021.6 Each complexity result, which we derive, relies only on one or two of these assumptions simultaneously.…”
mentioning
confidence: 99%