2017
DOI: 10.48550/arxiv.1707.02444
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Global optimality conditions for deep neural networks

Abstract: We study the error landscape of deep linear and nonlinear neural networks with the squared error loss. Minimizing the loss of a deep linear neural network is a nonconvex problem, and despite recent progress, our understanding of this loss surface is still incomplete. For deep linear networks, we present necessary and sufficient conditions for a critical point of the risk function to be a global minimum. Surprisingly, our conditions provide an efficiently checkable test for global optimality, while such tests a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
27
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 22 publications
(29 citation statements)
references
References 8 publications
2
27
0
Order By: Relevance
“…For this reason, deep linear networks have been the subject of extensive theoretical analysis. A line of work (Kawaguchi, 2016;Hardt & Ma, 2016;Lu & Kawaguchi, 2017;Yun et al, 2017;Zhou & Liang, 2018;Laurent & von Brecht, 2018) studied the landscape properties of deep linear networks. Although it was established that all local minima are global under certain assumptions, these properties alone are still not sufficient to guarantee global convergence or to provide a concrete rate of convergence for gradient-based optimization algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…For this reason, deep linear networks have been the subject of extensive theoretical analysis. A line of work (Kawaguchi, 2016;Hardt & Ma, 2016;Lu & Kawaguchi, 2017;Yun et al, 2017;Zhou & Liang, 2018;Laurent & von Brecht, 2018) studied the landscape properties of deep linear networks. Although it was established that all local minima are global under certain assumptions, these properties alone are still not sufficient to guarantee global convergence or to provide a concrete rate of convergence for gradient-based optimization algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…The training loss of multilayer neural networks at differentiable local minima was examined in [38]. Yun et al [44] very recently provided sufficient and necessary conditions to guarantee that certain critical points are also global minima.…”
Section: Introductionmentioning
confidence: 99%
“…Besides characterizing local minima, stronger claims on the stationary points can be proved for linear networks. Yun et al [240] and Zou et al [253] present necessary and sufficient conditions for a stationary point to be a global minimum.…”
Section: Global Landscape Analysis Of Deep Networkmentioning
confidence: 99%