2021
DOI: 10.48550/arxiv.2107.08649
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

Abstract: We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a nonasymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2021). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the ex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(11 citation statements)
references
References 33 publications
0
11
0
Order By: Relevance
“…the Hessian of U . This additional smoothness condition is similar to Assumption H4 of [Bro+19] and we impose it in order to obtain improved convergence rates compared to those derived in [Lov+20] and [Lim+21] for TUSLA.…”
Section: Assumptions and Main Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…the Hessian of U . This additional smoothness condition is similar to Assumption H4 of [Bro+19] and we impose it in order to obtain improved convergence rates compared to those derived in [Lov+20] and [Lim+21] for TUSLA.…”
Section: Assumptions and Main Resultsmentioning
confidence: 99%
“…In [Lim+21], the polynomial Lipschitz condition imposed in Assumptions 2 and 3 of [Lim+21], and the convex at infinity condition imposed in Assumption 4 of [Lim+21] are directly analogous to Assumptions 2 and 3 of this paper in the case of deterministic gradient. With our choice of taming factor as (1+λ|θ| 2r ) 1/2 , see (7), as opposed to (1 + √ λ|θ| r ) in [Lim+21], as well as assuming additional smoothness conditions on U in Assumption 4 of our work, we were able to improve the orders of convergence of 1/2 and 1/4 for Wasserstein distances of orders 1 and 2, respectively, derived in [Lim+21], to 1 and 1/2, respectively, in Theorems 2.7 and 2.8.…”
Section: 3mentioning
confidence: 97%
See 3 more Smart Citations