2022
DOI: 10.48550/arxiv.2201.06656
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Generalization in Supervised Learning Through Riemannian Contraction

Abstract: We prove that Riemannian contraction in a supervised learning setting implies generalization. Specifically, we show that if an optimizer is contracting in some Riemannian metric with rate λ > 0, it is uniformly algorithmically stable with rate O(1/λn), where n is the number of labelled examples in the training set. The results hold for stochastic and deterministic optimization, in both continuous and discrete-time, for convex and non-convex loss surfaces. The associated generalization bounds reduce to well-kno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 17 publications
0
9
0
Order By: Relevance
“…The constants "Γ" and "∆" are defined as Γ γ/β − γ/(β + γ) and ∆ 4βC log e 2 ⌈β/γ⌉ /(β − C/2). constant stepsize (i.e., depending only on β, γ) and T = C log(n) (with C = (β/γ + 1)/2), we show bounds of the order O(1/n 2 ), while prior works [6,43,44] on SGD provide rates of the order O(1/n). This means that full batch GD provably attains improved generalization error rates by one order of magnitude.…”
Section: Related Workmentioning
confidence: 67%
See 3 more Smart Citations
“…The constants "Γ" and "∆" are defined as Γ γ/β − γ/(β + γ) and ∆ 4βC log e 2 ⌈β/γ⌉ /(β − C/2). constant stepsize (i.e., depending only on β, γ) and T = C log(n) (with C = (β/γ + 1)/2), we show bounds of the order O(1/n 2 ), while prior works [6,43,44] on SGD provide rates of the order O(1/n). This means that full batch GD provably attains improved generalization error rates by one order of magnitude.…”
Section: Related Workmentioning
confidence: 67%
“…This means that full batch GD provably attains improved generalization error rates by one order of magnitude. Additionally, for convex losses and for a fixed step-size η t = 1/β and T ≤ n, we show tighter generalization error bounds of the order O(1/n), while bounds in prior work are of the order O(T /n) [6,44]. As a consequence, for smooth convex losses with T ≤ n 1 , we provide tighter full-batch GD generalization error bounds than existing bounds on SGD.…”
Section: Related Workmentioning
confidence: 69%
See 2 more Smart Citations
“…Generalization due to contractivity. Contractivity of SGD has also been used to derive generalization bounds in the concurrent work [KWS22]. Although the localized covering construction in Lemma 2.1 relies on the same principle, our results differ from those in [KWS22] on two key aspects.…”
Section: Relation To Fractal Dimensionmentioning
confidence: 80%