2018
DOI: 10.48550/arxiv.1810.02565
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Continuous-time Models for Stochastic Optimization Algorithms

Antonio Orvieto,
Aurelien Lucchi

Abstract: We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 30 publications
0
8
0
Order By: Relevance
“…In this work, the authors use backward error analysis to study how close SGD is to its approximation using a high order (involving the Hessian) stochastic modified equation. It would be interesting to derive a similar result for a stochastic variant of GD-ODE, such as the one studied in [40].…”
Section: Discussionmentioning
confidence: 97%
“…In this work, the authors use backward error analysis to study how close SGD is to its approximation using a high order (involving the Hessian) stochastic modified equation. It would be interesting to derive a similar result for a stochastic variant of GD-ODE, such as the one studied in [40].…”
Section: Discussionmentioning
confidence: 97%
“…which is inspired by the Lyapunov function in [22] (end of page 22). Differentiating, using Lemma 12 and α-weakly-quasi-convexity, we get the result.…”
Section: Discussionmentioning
confidence: 99%
“…In lieu of the inability of convex optimization theory to explain the behavior of SGD in non-convex settings, it is common to consider the behavior of Markov process models for stochastic optimizers [38]. These models are often continuous for ease of analysis [50,58], although discrete-time treatments have become increasingly popular [16,28]. Such continuous-time models are formulated as stochastic differential equations dW t " µpW t qdt `σpW t qdX t , where X t is typically Brownian motion, or some other Lèvy process, and derived through the (generalized) central limit theorem and taking learning rates to zero [23].…”
Section: Optimization-based Generalizationmentioning
confidence: 99%
“…A precise link to stochastic optimization comes through the following observation: suppose that W t , t P r0, 1s, is a continuous-time Markov process with transition kernel P t px, Eq, e.g., a continuous-time model of a stochastic optimizer [50]. Combining [65, Theorem 4.1] and [45,Theorem 5.7], if W t is spatially homogeneous, and…”
Section: Corollary 1 Suppose That W Has Hausdorff Dimension α and Is ...mentioning
confidence: 99%
See 1 more Smart Citation