2019
DOI: 10.1109/tit.2018.2854560
|View full text |Cite
|
Sign up to set email alerts
|

Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks

Abstract: In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set. We study this problem in the over-parameterized regime where the number of observations are fewer than the number of parameters in the model. We show that with quadratic activations the optimization landscape of training such shallow neural networks has certain favorable characteristics that allow globally optimal models to be found efficiently using a variety of local search heuristics. This … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

9
210
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 272 publications
(219 citation statements)
references
References 45 publications
9
210
0
Order By: Relevance
“…Therefore, For any hypothesis F (X) ∈ H v , there exists an element W (X) ∈ W , such that W (X) = W F S (X) + W F V (X) satisfying eqs. (47) and (48), and furthermore,…”
Section: Covering Bound For the Hypothesis Space Of Deep Neural Netwomentioning
confidence: 92%
“…Therefore, For any hypothesis F (X) ∈ H v , there exists an element W (X) ∈ W , such that W (X) = W F S (X) + W F V (X) satisfying eqs. (47) and (48), and furthermore,…”
Section: Covering Bound For the Hypothesis Space Of Deep Neural Netwomentioning
confidence: 92%
“…, d n − 1} and for all k ∈ [n]. 18 Why are we using complex features for our example instead of the real sines and cosines? Just because keeping track of which feature is an alias of which other feature is less notationally heavy for the complex case.…”
Section: Aliasing -The Core Issue In Overparameterized Modelsmentioning
confidence: 99%
“…≥ 3-layer neural networks.Promising progress has been made in all of these areas, which we recap only briefly below. Regarding the first point, while the optimization landscape for deep neural networks is non-convex and complicated, several independent recent works (an incomplete list is [12][13][14][15][16][17][18]) have shown that overparameterization can make it more attractive, in the sense that optimization algorithms like stochastic gradient descent (SGD) are more likely to actually converge to a global minimum. These interesting insights are mostly unrelated to the question of generalization, and should be viewed as a coincidental benefit of overparameterization.Second, a line of recent work [19][20][21][22] characterizes the inductive biases of commonly used optimization algorithms, thus providing insight into the identity of the global minimum that is selected.…”
mentioning
confidence: 99%
“…Assume that we are given data distributed according to a probability measure µ where (x, y) ∼ µ, and where R is often referred to as the risk or the risk function. In practice, the risk we have to minimize is the empirical risk, and it is a well-established fact that for neural networks the minimization problem in (1.6) is, in general, a non-convex minimization problem [33,2,35,8]. As such many search algorithms may get trapped at, or converge to, local minima which are not global minima [33].…”
Section: Introductionmentioning
confidence: 99%