Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Wang, Mengdi; Fang, Ethan X.; Liu, Han

doi:10.1007/s10107-016-1017-3

Cited by 176 publications

(349 citation statements)

References 29 publications

Supporting

Mentioning

343

Contrasting

Order By: Relevance

“…For the case where T = 1, our results match the best known sample complexity upper-and lower-bounds. For the case where T = 2, our results improve the convergence rate from O(n 2/9 ) of the a-SCGD in [26] to O(n 2/5 ). Besides, with additional assumption that the inner level function f (T ) in (1.1) has Lipschitz continuous gradients, we obtain a convergence rate O(n 4/9 ) for two-level problems, which matches the stateof-art result achieved by ASC-PG in [28].…”

mentioning

confidence: 62%

See 1 more Smart Citation

Multilevel Stochastic Gradient Methods for Nested Composition Optimization

Yang¹,

Wang²,

Fang³

2019

SIAM J. Optim.

Self Cite

View full text Add to dashboard Cite

Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one-or two-level expectations. In this paper, we consider the multi-level compositional optimization problem that involves compositions of multi-level component functions and nested expectations over a random path. It finds applications in risk-averse optimization and sequential planning. We propose a class of multi-level stochastic gradient methods that are motivated from the method of multi-timescale stochastic approximation. First we propose a basic T -level stochastic compositional gradient algorithm, establish its almost sure convergence and obtain an n-iteration error bound O(n 1/2 T ). Then we develop accelerated multi-level stochastic gradient methods by using an extrapolation-interpolation scheme to take advantage of the smoothness of individual component functions. When all component functions are smooth, we show that the convergence rate improves to O(n 4/(7+T ) ) for general objectives and O(n 4/(3+T ) ) for strongly convex objectives. We also provide almost sure convergence and rate of convergence results for nonconvex problems. The proposed methods and theoretical results are validated using numerical experiments.

show abstract

mentioning

confidence: 62%

“…We have also obtained convergence and rate of convergence results for nonconvex problems. Table 1 summarizes our results and compare them with the best known ones for the single-and two-level stochastic compositional optimization problems [9,19,23,26,28]. We also provide numerical experiments with a risk-averse regression problem.…”

mentioning

confidence: 98%

Multilevel Stochastic Gradient Methods for Nested Composition Optimization

Yang¹,

Wang²,

Fang³

2019

SIAM J. Optim.

Self Cite

View full text Add to dashboard Cite

show abstract

“…(1.5) and (1.6)) in optimization, see [27]. Stochastic compositional problems have also appeared in the parallel line of work [66]. There, the authors require the entire composite function to be either convex or smooth.…”

mentioning

confidence: 99%

Stochastic Model-Based Minimization of Weakly Convex Functions

Davis¹,

2019

View full text Add to dashboard Cite

We consider a family of algorithms that successively sample and minimize simple stochastic models of the objective function. We show that under reasonable conditions on approximation quality and regularity of the models, any such algorithm drives a natural stationarity measure to zero at the rate O(k −1/4 ). As a consequence, we obtain the first complexity guarantees for the stochastic proximal point, proximal subgradient, and regularized Gauss-Newton methods for minimizing compositions of convex functions with smooth maps. The guiding principle, underlying the complexity guarantees, is that all algorithms under consideration can be interpreted as approximate descent methods on an implicit smoothing of the problem, given by the Moreau envelope. Specializing to classical circumstances, we obtain the long-sought convergence rate of the stochastic projected gradient method, without batching, for minimizing a smooth function on a closed convex set.

show abstract

“…An established approach was to use two-level stochastic recursive algorithms with two stepsize sequences in different time scales: a slower one for updating the main decision variable x, and a faster one for tracking the value of the inner function(s). References [38,39] provide a detailed account of these techniques and existing results. In [40] these ideas were extended to multilevel problems of form (1), albeit with multiple time scales and under continuous differentiability assumptions.…”

Section: Introductionmentioning

confidence: 99%

“…Denote d k = y k − x k . By a discrete-time version of the argument that lead to (38), the decrease of the first part of the Lyapunov function can be estimated as follows:…”

mentioning

confidence: 99%

Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization

Ruszczyński

2020

Optim Lett

View full text Add to dashboard Cite

We propose a single time-scale stochastic subgradient method for constrained optimization of a composition of several nonsmooth and nonconvex functions. The functions are assumed to be locally Lipschitz and differentiable in a generalized sense. Only stochastic estimates of the values and generalized derivatives of the functions are used. The method is parameter-free. We prove convergence with probability one of the method, by associating with it a system of differential inclusions and devising a nondifferentiable Lyapunov function for this system. For problems with functions having Lipschitz continuous derivatives, the method finds a point satisfying an optimality measure with error of order 1/ √ N, after executing N iterations with constant stepsize.

show abstract

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Cited by 176 publications

References 29 publications

Multilevel Stochastic Gradient Methods for Nested Composition Optimization

Multilevel Stochastic Gradient Methods for Nested Composition Optimization

Stochastic Model-Based Minimization of Weakly Convex Functions

Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization

Contact Info

Product

Resources

About