2018
DOI: 10.1007/s10107-018-1297-x
|View full text |Cite
|
Sign up to set email alerts
|

On variance reduction for stochastic smooth convex optimization with multiplicative noise

Abstract: We propose dynamic sampled stochastic approximation (SA) methods for stochastic optimization with a heavy-tailed distribution (with finite 2nd moment).

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
38
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 40 publications
(40 citation statements)
references
References 38 publications
(137 reference statements)
2
38
0
Order By: Relevance
“…Then by setting β = √ 3 − 1, α i = 2− √ 3 L i , and p i = 1 n in (16), and using the above bound on E N i (k) −1 , we obtain the following recursion:…”
Section: Rate Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…Then by setting β = √ 3 − 1, α i = 2− √ 3 L i , and p i = 1 n in (16), and using the above bound on E N i (k) −1 , we obtain the following recursion:…”
Section: Rate Analysismentioning
confidence: 99%
“…A stochastic proximal gradient method was presented in [34] for solving composite convex stochastic optimization, where the a.s. convergence and a mean-squared convergence rate O(1/k) were developed in strongly convex regimes, in sharp contrast with the linear rate of convergence in deterministic settings. Variance reduction schemes have gained increasing relevance in first-order methods for stochastic convex optimization [13][14][15][16]35]; in one particular class of schemes, the true gradient is replaced by the average of an increasing batch of sampled gradients, progressively reducing the variance of the sample-average. In strongly convex regimes, linear rates were shown for stochastic gradient methods [16,35] and extragradient methods [15], while for merely convex optimization problems, accelerated rates of O(1/k 2 ) and O(1/k) were proven for smooth [13,16] and nonsmooth [15] regimes, respectively.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Using slowly diminishing step sizes η t = O.1=t α / (α < 1), Ruppert (1988) and Polyak (1990) showed that acceleration using the average over trajectories of this recursive stochastic approximation algorithm attains the optimal convergence rate for a strongly convex L (see Polyak and Juditsky (1992) for more details). Recently, the running times of stochastic first-order methods have been considerably improved by using combinations of variance reduction techniques (Roux et al, 2012;Johnson and Zhang, 2013) and Nesterov's acceleration Lan, 2012, 2016;Cotter et al, 2011;Jofré and Thompson, 2017;Arjevani and Shamir, 2016). Despite the celebrated success of stochastic first-order methods in modern machine learning tasks, researchers have kept improving the per-iteration complexity of second-order methods such as Newton or quasi-Newton methods, because of their faster convergence.…”
Section: Relationships To the Literaturementioning
confidence: 99%
“…(iv) Variance reduction schemes for stochastic optimization. There has been an effort to utilize increasing batchsizes of sampled gradients in stochastic gradient schemes, leading to improved rates of convergence, as seen in strongly convex [19]- [21] and convex regimes [20]- [23]. Novelty and Contributions.…”
Section: Introductionmentioning
confidence: 99%