2018
DOI: 10.1137/17m1154679
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Sampling Strategies for Stochastic Optimization

Abstract: In this paper, we propose a stochastic optimization method that adaptively controls the sample size used in the computation of gradient approximations. Unlike other variance reduction techniques that either require additional storage or the regular computation of full gradients, the proposed method reduces variance by increasing the sample size as needed. The decision to increase the sample size is governed by an inner product test that ensures that search directions are descent directions with high probabilit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
116
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 88 publications
(117 citation statements)
references
References 17 publications
1
116
0
Order By: Relevance
“…• Finally, unlike [4] where theoretical results require that |S k | depends on ∇f (X k ) , which is unknown, our bounds on the sample set sizes all use knowable quantities, such as bound on the variance and quantities computed by the algorithm.…”
Section: 3mentioning
confidence: 99%
See 2 more Smart Citations
“…• Finally, unlike [4] where theoretical results require that |S k | depends on ∇f (X k ) , which is unknown, our bounds on the sample set sizes all use knowable quantities, such as bound on the variance and quantities computed by the algorithm.…”
Section: 3mentioning
confidence: 99%
“…Moreover for them to obtain convergence rates matching those of GD in expectation, a small constant step-size must be known in advance and the sample size needs to be increased at a pre-described rate thus decreasing the variance of gradient estimates. Recently, in [4] an adaptive sample size selection strategy was proposed where sample size is selected based on the reduction of the gradient (and not pre-described). For convergence rates to be derived, however, an assumption has to be made that these sample sizes can be selected based on the size of the true gradient, which is, of course, unknown.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The selection of the mini-batch size depends on the satisfaction of a condition known as the norm test, which monitors the norm of the sample variance within the mini-batch. Simlarly, Bollapragada et al [38] propose an approximate inner product test, which ensures that search directions are descent directions with high probability and improves over the norm test. Furthermore, Metel [39] presents dynamic sampling rules to ensure that the gradient follows a descent direction with higher probability -this depends on a dynamic sampling of mini-batch size that reduces the estimated sample covariance.…”
Section: Literature Reviewmentioning
confidence: 99%
“…4 15.75 ± 1. 38 4.04 ± 0.28 0.70 ± 0.10 0.03 ± 0.01 0.01 ± 0.01 0.00 ± 0.00 0.00 ± 0.00 Dynamic Sampling Alternatives s16-to-256 s32-to-128 s32-to-512 s128-to-512 s512-to-32 0.1 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.02 ± 0.01 s16-to-256-MS s32-to-128-MS s32-to-512-MS s128-to-512-MS 0.1 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00…”
Section: Resnet Cifar-10 Training Errormentioning
confidence: 99%