2016
DOI: 10.1073/pnas.1608103113
|View full text |Cite
|
Sign up to set email alerts
|

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

Abstract: In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

7
200
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3
1

Relationship

2
6

Authors

Journals

citations
Cited by 134 publications
(207 citation statements)
references
References 32 publications
7
200
0
Order By: Relevance
“…While some hype does exist, DL undeniably delivered unrivaled performance and solved exciting problems that have been difficult for artificial intelligence (AI) for many years (LeCun et al, ; Silver et al, ). DL algorithms have shown a generational leap in predictive capability which some argued as unreasonable (Baldassi et al, ; C. Sun et al, ). Since 2012, as an indication of advances, DL emerged as a dominant force that breaks records in most machine learning contests where it is applicable (Schmidhuber, ).…”
Section: Motivationsmentioning
confidence: 99%
“…While some hype does exist, DL undeniably delivered unrivaled performance and solved exciting problems that have been difficult for artificial intelligence (AI) for many years (LeCun et al, ; Silver et al, ). DL algorithms have shown a generational leap in predictive capability which some argued as unreasonable (Baldassi et al, ; C. Sun et al, ). Since 2012, as an indication of advances, DL emerged as a dominant force that breaks records in most machine learning contests where it is applicable (Schmidhuber, ).…”
Section: Motivationsmentioning
confidence: 99%
“…These regions are defined in terms of the volume of the weights around a minimizer which do not lead to an increase of the loss value (e.g. number of errors) [6]. For discrete weights this notions reduces to the so called Local Entropy [7] of a minimizer.…”
mentioning
confidence: 99%
“…For the numerical results, we have used simulated annealing on a system with K = 32 (K = 33) for the ReLU (sign) activations (respectively), and N = K 2 10 3 . We have simulated a system of y interacting replicas that is able to sample from the local-entropic measure [6] with the RRR Monte Carlo method [21], ensuring that the annealing process was sufficiently slow such that at the end of the simulation all replicas were solutions, and controlling the interaction such that the average overlap between replicas was equal to q 1 within a tolerance of 0.01. The results were averaged over 20 samples.…”
mentioning
confidence: 99%
“…More precisely, the local (free) entropy of a certain configuration of the weights w * is defined as [14]:…”
Section: Replicated Systems and Overfittingmentioning
confidence: 99%
“…where L (r) tot is the total loss of the replica r. It is important at this stage to observe that the canonical physical description presupposes a noisy optimization process where the amount of noise is regulated by some inverse temperature β, while in this work (following ref. [14]) we will be relying on the noise provided by SGD instead, thereby using the mini-batch size and the learning rate as "equivalent" control parameters. Relatedly, we should also note that, although the interaction term is purely attractive, the replicas won't collapse unless the coupling coefficient λ is very large, due to the presence of noise in the optimization.…”
Section: Replicated Systems and Overfittingmentioning
confidence: 99%