2021
DOI: 10.48550/arxiv.2107.10370
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II

Abstract: We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. We make use of the rich symmetry structure to develop a novel set of tools for studying families of spurious minima. In contrast to existing approaches which operate in limiting regimes, our technique directly addresses the nonconvex loss landscape for a finite number of inputs d and neurons k, and provides analytic, rather than heuristic, in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(7 citation statements)
references
References 39 publications
0
7
0
Order By: Relevance
“…The FPS result is used in [3], together with results from the representation theory of the symmetric group, to obtain precise results on the Hessian spectrum for several families of spurious minima in shallow neural networks (valid for arbitrarily large k). In [5], these results are extended to two-layer ReLU networks where it is shown that to order O(k − 1 2 ) the spectra are identical for the global minima and several families of spurious minima. All results to this point assume that the number of inputs d to the network is greater than or equal to the number of neurons k. In [6], the overparametrized case k > d is analysed and it is shown, using FPS methods and representation theory, that the addition of one or two neurons annihilates certain families of spurious minima of types I and II (as defined in [6]).…”
Section: Outline Of Paper and Main Resultsmentioning
confidence: 93%
See 4 more Smart Citations
“…The FPS result is used in [3], together with results from the representation theory of the symmetric group, to obtain precise results on the Hessian spectrum for several families of spurious minima in shallow neural networks (valid for arbitrarily large k). In [5], these results are extended to two-layer ReLU networks where it is shown that to order O(k − 1 2 ) the spectra are identical for the global minima and several families of spurious minima. All results to this point assume that the number of inputs d to the network is greater than or equal to the number of neurons k. In [6], the overparametrized case k > d is analysed and it is shown, using FPS methods and representation theory, that the addition of one or two neurons annihilates certain families of spurious minima of types I and II (as defined in [6]).…”
Section: Outline Of Paper and Main Resultsmentioning
confidence: 93%
“…In the concluding comments, we return to the original motivating problem about the creation and annihilation of spurious minima, indicate how the results of the paper can be used to understand this phenomenon, and discussed related current and proposed developments. [1][2][3][4][5][6] Article [2] identifies spurious minima as examples of symmetry breaking in the student-teacher model and gives an extensive numerical study of the phenomenon in wide range of settings. In [4] several infinite families of critical points of spurious minima are constructed and it is shown that these critical points may be represented by convergent fractional power series (FPS) in 1/ √ k (k is the number of neurons viewed as a real parameter).…”
Section: Outline Of Paper and Main Resultsmentioning
confidence: 99%
See 3 more Smart Citations