Adaptive Sampling Strategies for Stochastic Optimization

Bollapragada, Raghu; Byrd, Richard H.; Nocedal, Jorge

doi:10.1137/17m1154679

Cited by 88 publications

(117 citation statements)

References 17 publications

Supporting

Mentioning

116

Contrasting

Order By: Relevance

“…• Finally, unlike [4] where theoretical results require that |S k | depends on ∇f (X k ) , which is unknown, our bounds on the sample set sizes all use knowable quantities, such as bound on the variance and quantities computed by the algorithm.…”

Section: 3mentioning

confidence: 99%

“…Moreover for them to obtain convergence rates matching those of GD in expectation, a small constant step-size must be known in advance and the sample size needs to be increased at a pre-described rate thus decreasing the variance of gradient estimates. Recently, in [4] an adaptive sample size selection strategy was proposed where sample size is selected based on the reduction of the gradient (and not pre-described). For convergence rates to be derived, however, an assumption has to be made that these sample sizes can be selected based on the size of the true gradient, which is, of course, unknown.…”

Section: Introductionmentioning

confidence: 99%

“…In [4] and [9] a practical back-tracking line search is proposed, combined with the their sample size selection. In both cases the backtracking is based on Armijo line search condition applied to function estimates that are computed on the same batch as the gradient estimates and is essentially a heuristic.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Stochastic Line Search Method with Expected Complexity Analysis

Paquette¹,

Scheinberg²

2020

SIAM J. Optim.

View full text Add to dashboard Cite

For deterministic optimization, line-search methods augment algorithms by providing stability and improved efficiency. We adapt a classical backtracking Armijo line-search to the stochastic optimization setting. While traditional line-search relies on exact computations of the gradient and values of the objective function, our method assumes that these values are available up to some dynamically adjusted accuracy which holds with some sufficiently large, but fixed, probability. We show the expected number of iterations to reach a near stationary point matches the worst-case efficiency of typical first-order methods, while for convex and strongly convex objective, it achieves rates of deterministic gradient descent in function values.

show abstract

Section: 3mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Stochastic Line Search Method with Expected Complexity Analysis

Paquette¹,

Scheinberg²

2020

SIAM J. Optim.

View full text Add to dashboard Cite

show abstract

“…The selection of the mini-batch size depends on the satisfaction of a condition known as the norm test, which monitors the norm of the sample variance within the mini-batch. Simlarly, Bollapragada et al [38] propose an approximate inner product test, which ensures that search directions are descent directions with high probability and improves over the norm test. Furthermore, Metel [39] presents dynamic sampling rules to ensure that the gradient follows a descent direction with higher probability -this depends on a dynamic sampling of mini-batch size that reduces the estimated sample covariance.…”

Section: Literature Reviewmentioning

confidence: 99%

“…4 15.75 ± 1. 38 4.04 ± 0.28 0.70 ± 0.10 0.03 ± 0.01 0.01 ± 0.01 0.00 ± 0.00 0.00 ± 0.00 Dynamic Sampling Alternatives s16-to-256 s32-to-128 s32-to-512 s128-to-512 s512-to-32 0.1 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.02 ± 0.01 s16-to-256-MS s32-to-128-MS s32-to-512-MS s128-to-512-MS 0.1 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00 0.00 ± 0.00…”

Section: Resnet Cifar-10 Training Errormentioning

confidence: 99%

Approximate Fisher Information Matrix to Characterize the Training of Deep Neural Networks

Liao

Drummond

Reid

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.

show abstract

Optimal operating points for wind turbine control and co‐design

Pusch,

Stockhouse,

Abbas

et al. 2023

Wind Energy

View full text Add to dashboard Cite

A versatile framework is introduced for determining optimal steady‐state operating points for wind turbine control. The framework is based on solving constrained optimization problems at fixed wind speeds and allows for systematically studying required trade‐offs and parameter sensitivities. It can be used as a basis for many control approaches, for example, to automatically compute optimal schedules for control inputs, steady‐state operating points for model linearization, or reference values for tracking. Steady‐state simulation results are obtained using full nonlinear models to consider complex effects caused by couplings from aerodynamics, structural dynamics, and possibly also hydrodynamics in the case of floating wind turbines. Focusing only on the steady‐state response allows a fast and numerically robust optimization, which makes it especially attractive for co‐design studies. The effectiveness of the framework is demonstrated on two offshore extreme‐scale wind turbines, one floating and one fixed bottom.

show abstract

Adaptive Sampling Strategies for Stochastic Optimization

Cited by 88 publications

References 17 publications

A Stochastic Line Search Method with Expected Complexity Analysis

A Stochastic Line Search Method with Expected Complexity Analysis

Approximate Fisher Information Matrix to Characterize the Training of Deep Neural Networks

Optimal operating points for wind turbine control and co‐design

Contact Info

Product

Resources

About