The Implicit Bias of Benign Overfitting

Shamir, Ohad

doi:10.48550/arxiv.2201.11489

Cited by 1 publication

(1 citation statement)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While both benign underfitting and benign overfitting challenge traditional generalization techniques, that postulate the training error to represent the test error, as we discuss above these two phenomena point to very different regimes of learning. In particular, Shamir (2022) shows that benign overfitting requires distributional assumptions for the interpolating algorithm to succeed. In contrast, we show that benign underfitting happens for SGD in a setting where it provably learns (namely, SCO), without any distributional assumptions.…”

Section: Additional Related Workmentioning

confidence: 99%

Benign Underfitting of Stochastic Gradient Descent

Koren¹,

Livni²,

Mansour³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without -replacement) SGD is classically known to minimize the population risk at rate O(1/ √ n), and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of Ω(1). Consequently, it turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis). We then continue to analyze the closely related with-replacement SGD, for which we show that an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate. Finally, we interpret our main results in the context of without-replacement SGD for finite-sum convex optimization problems, and derive upper and lower bounds for the multi-epoch regime that significantly improve upon previously known results.

show abstract

Section: Additional Related Workmentioning

confidence: 99%