Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

Liu, Bo

doi:10.48550/arxiv.2102.13233

Cited by 2 publications

(2 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the previous results strongly depend on the data and are rather independent of architecture. For example, one major assumption is that the data cannot be perfected and fitted by a linear model(Yun et al 2018, He et al 2020, Liu 2021. Some other works explicitly construct data distribution(Safran and Shamir 2018, Venturi et al 2019).…”

mentioning

confidence: 99%

Exact solutions of a deep linear network ^*

Ziyin,

Li,

Meng

2023

J. Stat. Mech.

View full text Add to dashboard Cite

This work finds the analytical expression for the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in the deep neural network loss landscape where highly nonlinear phenomenon emerge. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than one hidden layer, qualitatively different from a network with only one hidden layer. Practically, our result implies that common deep learning initialization methods are generally insufficient to ease the optimization of neural networks.

show abstract

mentioning

confidence: 99%

Exact solutions of a deep linear network ^*

Ziyin,

Li,

Meng

2023

J. Stat. Mech.

View full text Add to dashboard Cite

show abstract

“…However, the previous results strongly depend on the data and rather independent of architecture. For example, one major assumption is that the data cannot be perfected fitted by a linear model(Yun et al, 2018;Liu, 2021;He et al, 2020). Some other works explicitly construct data distribution(Safran and Shamir, 2018;Venturi et al, 2019).…”

mentioning

confidence: 99%

Exact Solutions of a Deep Linear Network

Ziyin¹,

Li²,

Meng³

2022

Preprint

View full text Add to dashboard Cite

This work finds the exact solutions to a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that weight decay strongly interacts with the model architecture and can create bad minima in a network with more than 1 hidden layer, qualitatively different for a network with only 1 hidden layer. As an application, we also analyze stochastic nets and show that their prediction variance vanishes to zero as the stochasticity, the width, or the depth tends to infinity.

show abstract

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

Cited by 2 publications

References 31 publications

Exact solutions of a deep linear network ^*

Exact solutions of a deep linear network ^*

Exact Solutions of a Deep Linear Network

Contact Info

Product

Resources

About

Spurious Local Minima Are Common for Deep Neural Networks with Piecewise Linear Activations

Cited by 2 publications

References 31 publications

Exact solutions of a deep linear network *

Exact solutions of a deep linear network *

Exact Solutions of a Deep Linear Network

Contact Info

Product

Resources

About

Exact solutions of a deep linear network ^*

Exact solutions of a deep linear network ^*