2020
DOI: 10.1007/s11590-020-01537-8
|View full text |Cite
|
Sign up to set email alerts
|

Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization

Abstract: We propose a single time-scale stochastic subgradient method for constrained optimization of a composition of several nonsmooth and nonconvex functions. The functions are assumed to be locally Lipschitz and differentiable in a generalized sense. Only stochastic estimates of the values and generalized derivatives of the functions are used. The method is parameter-free. We prove convergence with probability one of the method, by associating with it a system of differential inclusions and devising a nondifferenti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(7 citation statements)
references
References 32 publications
0
6
0
1
Order By: Relevance
“…In concurrence to our work, an adaptive and accelerated SCGD has been studied in [7], but the updates of [5,7] are different from ours, and thus their convergence rates are still slower than ours and that of SGD for the non-compositional case. While most of existing algorithms for stochastic compositional problems rely on two-timescale stepsizes, the single timescale approach has been recently developed in [8,9]. Our improvements over [8,9] are: i) a different and simpler algorithm that tracks only two sequences instead of three; ii) a neat ODE analysis backing up our algorithm development, which may stimulate future development; and, more importantly, iii) the simplicity of our algorithm makes it easy to adopt other techniques such as Adam update.…”
Section: Prior Artmentioning
confidence: 99%
“…In concurrence to our work, an adaptive and accelerated SCGD has been studied in [7], but the updates of [5,7] are different from ours, and thus their convergence rates are still slower than ours and that of SGD for the non-compositional case. While most of existing algorithms for stochastic compositional problems rely on two-timescale stepsizes, the single timescale approach has been recently developed in [8,9]. Our improvements over [8,9] are: i) a different and simpler algorithm that tracks only two sequences instead of three; ii) a neat ODE analysis backing up our algorithm development, which may stimulate future development; and, more importantly, iii) the simplicity of our algorithm makes it easy to adopt other techniques such as Adam update.…”
Section: Prior Artmentioning
confidence: 99%
“…尽管次梯度法主要应用在凸问题中, 近年的研究表明, 次梯度法在非凸问题上也有理论保证. 最 近的一些工作 [197][198][199] 证明了次梯度法在较为广泛的非凸模型 (包括神经网络) 能渐近收敛到稳定 点. 在大规模非光滑非凸优化中, 文献 [200]…”
Section: 次梯度法unclassified
“…where P is a probability distribution on some measurable space (S, A). This type of problems has many applications, we refer for instance to [44,45] and references therein, for various examples in several fields. Our specific interest goes in particular to online deep learning [46] and machine learning more broadly [16].…”
Section: Introductionmentioning
confidence: 99%
“…An essential series of works on subgradient sampling are those developed in [27], followed also by [44,45], using the generalized derivatives and the calculus introduced in Norkin [35], see Section 6 for more details on this notion. These works address in particular the question of the interplay between expectations and subgradients under various assumptions, and provide as well convergence results to "generalized critical points".…”
Section: Introductionmentioning
confidence: 99%