2017
DOI: 10.48550/arxiv.1710.10345
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Implicit Bias of Gradient Descent on Separable Data

Abstract: We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithm… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
46
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(49 citation statements)
references
References 3 publications
3
46
0
Order By: Relevance
“…(III) Implicit bias of SGD: Numerous empirical evidences have already shown that RNNs trained by stochastic gradient descent (SGD) algorithms have superior generalization performance. There have been a few theoretical results showing that SGD tends to yield low complexity models, which can generalize (Neyshabur et al, 2014(Neyshabur et al, , 2015Zhang et al, 2016;Soudry et al, 2017). Can we extend this argument to RNNs?…”
Section: Extensions To Mgu Lstm and Conv Rnnsmentioning
confidence: 99%
“…(III) Implicit bias of SGD: Numerous empirical evidences have already shown that RNNs trained by stochastic gradient descent (SGD) algorithms have superior generalization performance. There have been a few theoretical results showing that SGD tends to yield low complexity models, which can generalize (Neyshabur et al, 2014(Neyshabur et al, , 2015Zhang et al, 2016;Soudry et al, 2017). Can we extend this argument to RNNs?…”
Section: Extensions To Mgu Lstm and Conv Rnnsmentioning
confidence: 99%
“…One study shows that the logistic regression model and 2-layer neural networks using monotone decreasing loss functions tend to converge in the direction of the max-margin solution when using GD and SGD (Soudry et al, 2017). We further enhance the conclusion by doing studies on more practical deep learning systems.…”
Section: Related Workmentioning
confidence: 67%
“…As deep neural networks remain mysterious for reasons, many researchers tried to reveal the inside logic starting from shallow models (Mianjy et al, 2018;Soudry et al, 2017;Gunasekar et al, 2017). It is useful to appeal to the simple case of shallow neural network models to see if there are parallel insights that can help us understand generalization better before we move on to the deep learning systems in the next section.…”
Section: Shallow Neural Network Experimentsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is in contrast to the SVM method which can be used to find a particularly good (i.e., large margin) linear separator. The behavior of computational schemes for LR when the dataset is separable is not so well understood in theory, though there is recent work on first-order methods [16], [31], [18], [14], [19]. One of the main goals of this paper is to formalize the "informal" computational and statistical intuitions regarding logistic regression and to provide formal results that validate (or run counter to) such intuitive statements.…”
Section: Introductionmentioning
confidence: 99%