2020
DOI: 10.48550/arxiv.2006.08212
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Abstract: In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y = θ * , Φ(U ) between the random output Y and the random feature vector Φ(U ), a potentially non-linear transformation of the inputs U . We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum θ * and the decay of the generalization error follow polynomial… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(29 citation statements)
references
References 22 publications
0
29
0
Order By: Relevance
“…Our variance analysis is sharper than Dieuleveut and Bach (2015) in that we provide a bound in terms of the full spectrum (with the same R 2 assumption) along with a lower bound, while Dieuleveut and Bach (2015) assume decay conditions on the spectrum. The bias analysis in Dieuleveut and Bach (2015); Berthier et al (2020) relies on a stronger assumption in that H −α w * 2 must be finite, where α > 0 is a constant (e.g., A4 in Bach and Moulines (2013) and Theorem 1 condition (a) in Berthier et al (2020)); our conjecture is that without relying on stronger fourth moment assumption (such as those consistent with sub-Gaussians), such dependencies are not avoidable. Our fourth moment assumption is a natural starting point for analyzing the over-parameterized regime because it also allows for direct comparisons to OLS and ridge regression, as discussed above.…”
Section: Further Related Workmentioning
confidence: 93%
See 3 more Smart Citations
“…Our variance analysis is sharper than Dieuleveut and Bach (2015) in that we provide a bound in terms of the full spectrum (with the same R 2 assumption) along with a lower bound, while Dieuleveut and Bach (2015) assume decay conditions on the spectrum. The bias analysis in Dieuleveut and Bach (2015); Berthier et al (2020) relies on a stronger assumption in that H −α w * 2 must be finite, where α > 0 is a constant (e.g., A4 in Bach and Moulines (2013) and Theorem 1 condition (a) in Berthier et al (2020)); our conjecture is that without relying on stronger fourth moment assumption (such as those consistent with sub-Gaussians), such dependencies are not avoidable. Our fourth moment assumption is a natural starting point for analyzing the over-parameterized regime because it also allows for direct comparisons to OLS and ridge regression, as discussed above.…”
Section: Further Related Workmentioning
confidence: 93%
“…We make this assumption to emphasize that our variance analysis does not rely on stronger assumptions than those in a number of prior works for iterate averaged SGD (Bach and Moulines, 2013;Jain et al, 2017b;Berthier et al, 2020). Moreover, note that this assumption is implied by Assumption 2.2 by setting A = I, which gives R 2 = α tr(H).…”
Section: B3 Bounding the Variance Errormentioning
confidence: 99%
See 2 more Smart Citations
“…More refined assumptions can lead to completely different convergence rates. In the context of kernel methods and SGD studies, it is common to consider power law spectral conditions ("source condition" and "capacity condition") that are known to give power law convergence rate bounds O(n −ξ ) with different exponents ξ (Berthier et al, 2020;Zou et al, 2021;Varre et al, 2021).…”
Section: Introductionmentioning
confidence: 99%