2020
DOI: 10.48550/arxiv.2002.01523
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Deep Conditioning Treatment of Neural Networks

Naman Agarwal,
Pranjal Awasthi,
Satyen Kale

Abstract: We study the role of depth in training randomly initialized overparameterized neural networks. We give the first general result showing that depth improves trainability of neural networks by improving the conditioning of certain kernel matrices of the input data. This result holds for arbitrary non-linear activation functions, and we provide a characterization of the improvement in conditioning as a function of the degree of non-linearity and the depth of the network. We provide versions of the result that hol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 23 publications
0
2
0
Order By: Relevance
“…In Das et al [2019], it is shown that deep random neural networks (of depth ω(log(n))) with the sign activation function, are hard to learn in the statistical query (SQ) model. This result was recently extended by Agarwal et al [2020] to other activation functions, including the ReLU function. While their results hold for networks of depth ω(log(n)) and for SQ algorithms, our results hold for depth-2 networks and for all algorithms.…”
Section: Related Workmentioning
confidence: 78%
“…In Das et al [2019], it is shown that deep random neural networks (of depth ω(log(n))) with the sign activation function, are hard to learn in the statistical query (SQ) model. This result was recently extended by Agarwal et al [2020] to other activation functions, including the ReLU function. While their results hold for networks of depth ω(log(n)) and for SQ algorithms, our results hold for depth-2 networks and for all algorithms.…”
Section: Related Workmentioning
confidence: 78%
“…Due to the empirical success of neural networks, there has been much effort to understand under what assumptions neural networks may be learned efficiently. This effort includes making assumptions on the input distribution [Li and Yuan, 2017, Brutzkus and Globerson, 2017, Du et al, 2017a,b, Du and Goel, 2018, the network's weights [Arora et al, 2014, Das et al, 2019, Agarwal et al, 2020, Goel and Klivans, 2017, or both [Janzamin et al, 2015, Tian, 2017, Bakshi et al, 2019. Hence, distribution-specific learning of neural networks is a central problem.…”
Section: Intersections Of Halfspacesmentioning
confidence: 99%