2021
DOI: 10.48550/arxiv.2102.13233
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

Abstract: In this paper, it is shown theoretically that spurious local minima are common for deep fully-connected networks and convolutional neural networks (CNNs) with piecewise linear activation functions and datasets that cannot be fitted by linear models. A motivating example is given to explain the reason for the existence of spurious local minima: each output neuron of deep fully-connected networks and CNNs with piecewise linear activations produces a continuous piecewise linear (CPWL) output, and different pieces… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 31 publications
0
2
0
Order By: Relevance
“…However, the previous results strongly depend on the data and are rather independent of architecture. For example, one major assumption is that the data cannot be perfected and fitted by a linear model(Yun et al 2018, He et al 2020, Liu 2021. Some other works explicitly construct data distribution(Safran and Shamir 2018, Venturi et al 2019).…”
mentioning
confidence: 99%
“…However, the previous results strongly depend on the data and are rather independent of architecture. For example, one major assumption is that the data cannot be perfected and fitted by a linear model(Yun et al 2018, He et al 2020, Liu 2021. Some other works explicitly construct data distribution(Safran and Shamir 2018, Venturi et al 2019).…”
mentioning
confidence: 99%
“…However, the previous results strongly depend on the data and rather independent of architecture. For example, one major assumption is that the data cannot be perfected fitted by a linear model(Yun et al, 2018;Liu, 2021;He et al, 2020). Some other works explicitly construct data distribution(Safran and Shamir, 2018;Venturi et al, 2019).…”
mentioning
confidence: 99%