2020
DOI: 10.1109/jproc.2020.3028013
|View full text |Cite
|
Sign up to set email alerts
|

Variance-Reduced Methods for Machine Learning

Abstract: Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a method introduced over 60 years ago. The last 8 years have seen an exciting new development: variance reduction (VR) for stochastic optimization methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving a faster convergence than SGD in theory as well as practice. These speedups underline the surge of interest in VR methods and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
72
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 71 publications
(72 citation statements)
references
References 34 publications
0
72
0
Order By: Relevance
“…Variance reduction in RL. The seminal idea of variance reduction was originally proposed to accelerate finite-sum stochastic optimization, e.g., Gower et al (2020); Johnson and Zhang (2013); Nguyen et al (2017). Thereafter, the variance reduction strategy has been imported to RL, which assists in improving the sample efficiency of RL algorithms in multiple contexts, including but not limited to policy evaluation (Du et al, 2017;Khamaru et al, 2020;Wai et al, 2019;Xu et al, 2019), RL with a generative model (Sidford et al, 2018a,b;Wainwright, 2019b), asynchronous Q-learning (Li et al, 2020b), and offline RL (Yin et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Variance reduction in RL. The seminal idea of variance reduction was originally proposed to accelerate finite-sum stochastic optimization, e.g., Gower et al (2020); Johnson and Zhang (2013); Nguyen et al (2017). Thereafter, the variance reduction strategy has been imported to RL, which assists in improving the sample efficiency of RL algorithms in multiple contexts, including but not limited to policy evaluation (Du et al, 2017;Khamaru et al, 2020;Wai et al, 2019;Xu et al, 2019), RL with a generative model (Sidford et al, 2018a,b;Wainwright, 2019b), asynchronous Q-learning (Li et al, 2020b), and offline RL (Yin et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…To reduce the variance of the gradient estimate (for stochastic optimisation) and to allow a constant stepsize, in recent years, several variance reduction techniques have been developed in the machine learning community, e.g., SAG [36], SAGA [15], SVRG [26], and SARAH [32]; see [20] for an up-to-date overview. These techniques reduce the variance of the gradient by including in the search direction an average of the full gradient, which is updated either according to a predefined update schedule, or per-iteration.…”
Section: Stochastic Expectation Maximisationmentioning
confidence: 99%
“…A number of methods employ subsampled approximations of the objective function and its derivatives, with the aim of reducing the computational cost. Focusing on first-order methods, the stochastic gradient [26] and more contemporary variants like SVRG [19,20], SAG [27], ADAM [21] and SARAH [24] are widely used for their simplicity and low cost per-iteration. They do not call for function evaluations but require tuning the learning rate and further possible hyper-parameters such as the mini-batch size.…”
Section: Introductionmentioning
confidence: 99%