2021
DOI: 10.48550/arxiv.2110.00683
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning through atypical "phase transitions" in overparameterized neural networks

Carlo Baldassi,
Clarissa Lauditi,
Enrico M. Malatesta
et al.

Abstract: Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that escape the bias-variance predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analyti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

2
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(9 citation statements)
references
References 29 publications
2
7
0
Order By: Relevance
“…Also, with few exceptions, the solutions we find can be connected by paths of (near-)zero error with a single bend. Overall, our results are compatible with the analysis of the geometry of the space of solutions in binary, shallow networks reported in Baldassi et al (2021a) and(2021b), according to which efficient algorithms target large connected structures of solutions with the more robust (flatter) ones at the center, radiating out into progressively sharper ones.…”
Section: Introductionsupporting
confidence: 88%
“…Also, with few exceptions, the solutions we find can be connected by paths of (near-)zero error with a single bend. Overall, our results are compatible with the analysis of the geometry of the space of solutions in binary, shallow networks reported in Baldassi et al (2021a) and(2021b), according to which efficient algorithms target large connected structures of solutions with the more robust (flatter) ones at the center, radiating out into progressively sharper ones.…”
Section: Introductionsupporting
confidence: 88%
“…Unfortunately, it is challenging to extend this approach to deeper networks. Other groups are studying the role of overparametrisation and related phenomena such as the double descent in the regime of lazy training [14][15][16][17][18][19][20]. Also the statistical physics of kernel learning (originally started in [21]) has undergone a revival in the last few years [22], mainly due to the discovery of the Neural Tangent Kernel (NTK) limit of deep neural networks -a mathematical equivalence between neural networks and a certain kernel that arises in the limit of large layer size [23,24].…”
Section: Introductionmentioning
confidence: 99%
“…Theoretical studies of neural networks become increasingly important in recent years [1][2][3], as deep neural networks are widely used in various domains of both scientific and industrial communities. One of the most powerful theoretical tools is the replica method, which is able to derive equilibrium properties of neural networks (systems of interacting neurons or synapses), such as phase diagram [4][5][6][7], storage capacity [8][9][10][11], and even large-deviation behavior of learning algorithms [12][13][14]. Intuitively, the replica method introduces n (an integer) copies of the original system.…”
Section: Introductionmentioning
confidence: 99%
“…However, the region of the solution space accessed by practical algorithms does not belong to the equilibrium hard-to-reach isolated parts, but subdominant dense parts [12,13]. These dense parts are further shown to have good generalization properties [14,30], providing a new paradigm to understand deep learning. In this work, we show how the algorithmic instability is connected to the instability of the replica symmetric (RS) solution (or replicon mode) of the model.…”
Section: Introductionmentioning
confidence: 99%