2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9206727
|View full text |Cite
|
Sign up to set email alerts
|

Loss Surface Modality of Feed-Forward Neural Network Architectures

Abstract: It has been argued in the past that high-dimensional neural networks do not exhibit local minima capable of trapping an optimisation algorithm. However, the relationship between loss surface modality and the neural architecture parameters, such as the number of hidden neurons per layer and the number of hidden layers, remains poorly understood. This study employs fitness landscape analysis to study the modality of neural network loss surfaces under various feed-forward architecture settings. An increase in the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…Finally, since distances do not decay fully, this could again imply that trajectories converge to nearby saddle points. We know that saddle points are ubiquitous in high-dimensional systems (both general dynamical systems (Fyodorov and Khoruzhenko, 2016;Ben Arous et al, 2021) and empirically in neural network loss landscapes (Bosman et al, 2020b) and that the convergence of GD slows down near any critical point, whether that is a minima or a saddle. In fact, GD is notoriously bad at escaping saddle points (one of the reasons SGD is preferred in practice).…”
Section: Stability Analysis Near the Stationary State (Post-learning)mentioning
confidence: 99%
“…Finally, since distances do not decay fully, this could again imply that trajectories converge to nearby saddle points. We know that saddle points are ubiquitous in high-dimensional systems (both general dynamical systems (Fyodorov and Khoruzhenko, 2016;Ben Arous et al, 2021) and empirically in neural network loss landscapes (Bosman et al, 2020b) and that the convergence of GD slows down near any critical point, whether that is a minima or a saddle. In fact, GD is notoriously bad at escaping saddle points (one of the reasons SGD is preferred in practice).…”
Section: Stability Analysis Near the Stationary State (Post-learning)mentioning
confidence: 99%
“…An alternative to random sampling for NN loss landscapes is the progressive gradient walk (PGW) [2,4,5], used in this study. The PGW algorithm is based on the idea of a random walk [23,36].…”
Section: Progressive Gradient Samplingmentioning
confidence: 99%
“…LGCs were successfully used to investigate the minima associated with different loss functions [2] as well as NN architectures [5]. This study is a natural extension of [2,5], where the focus is shifted to the activation functions and their effect on the loss landscapes.…”
Section: Loss-gradient Clouds Loss-gradient Clouds (mentioning
confidence: 99%
“…Bosman et al [16][17][18][19] applied and adapted standard fitness landscape analysis techniques to error landscapes. Studies include: the influence of search space boundaries on the landscape analysis [16], the influence of regularisation on error surfaces [17], the influence of architecture settings on modality of the landscape [18], and the effect of different loss functions on the basins of attraction [19].…”
Section: Error Landscapesmentioning
confidence: 99%