Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

Baldassi, Carlo; Borgs, Christian; Chayes, Jennifer; Ingrosso, Alessandro; Lucibello, Carlo; Saglietti, Luca; Zecchina, Riccardo

doi:10.1073/pnas.1608103113

Cited by 134 publications

(207 citation statements)

References 32 publications

Supporting

Mentioning

200

Contrasting

Order By: Relevance

“…While some hype does exist, DL undeniably delivered unrivaled performance and solved exciting problems that have been difficult for artificial intelligence (AI) for many years (LeCun et al, ; Silver et al, ). DL algorithms have shown a generational leap in predictive capability which some argued as unreasonable (Baldassi et al, ; C. Sun et al, ). Since 2012, as an indication of advances, DL emerged as a dominant force that breaks records in most machine learning contests where it is applicable (Schmidhuber, ).…”

Section: Motivationsmentioning

confidence: 99%

A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists

Shen

2018

Water Resources Research

755

411

View full text Add to dashboard Cite

Deep learning (DL), a new generation of artificial neural network research, has transformed industries, daily lives, and various scientific disciplines in recent years. DL represents significant progress in the ability of neural networks to automatically engineer problem‐relevant features and capture highly complex data distributions. I argue that DL can help address several major new and old challenges facing research in water sciences such as interdisciplinarity, data discoverability, hydrologic scaling, equifinality, and needs for parameter regionalization. This review paper is intended to provide water resources scientists and hydrologists in particular with a simple technical overview, transdisciplinary progress update, and a source of inspiration about the relevance of DL to water. The review reveals that various physical and geoscientific disciplines have utilized DL to address data challenges, improve efficiency, and gain scientific insights. DL is especially suited for information extraction from image‐like data and sequential data. Techniques and experiences presented in other disciplines are of high relevance to water research. Meanwhile, less noticed is that DL may also serve as a scientific exploratory tool. A new area termed AI neuroscience, where scientists interpret the decision process of deep networks and derive insights, has been born. This budding subdiscipline has demonstrated methods including correlation‐based analysis, inversion of network‐extracted features, reduced‐order approximations by interpretable models, and attribution of network decisions to inputs. Moreover, DL can also use data to condition neurons that mimic problem‐specific fundamental organizing units, thus revealing emergent behaviors of these units. Vast opportunities exist for DL to propel advances in water sciences.

show abstract

Section: Motivationsmentioning

confidence: 99%

A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists

Shen

2018

Water Resources Research

755

411

View full text Add to dashboard Cite

show abstract

“…These regions are defined in terms of the volume of the weights around a minimizer which do not lead to an increase of the loss value (e.g. number of errors) [6]. For discrete weights this notions reduces to the so called Local Entropy [7] of a minimizer.…”

mentioning

confidence: 99%

“…For the numerical results, we have used simulated annealing on a system with K = 32 (K = 33) for the ReLU (sign) activations (respectively), and N = K 2 10 3 . We have simulated a system of y interacting replicas that is able to sample from the local-entropic measure [6] with the RRR Monte Carlo method [21], ensuring that the annealing process was sufficiently slow such that at the end of the simulation all replicas were solutions, and controlling the interaction such that the average overlap between replicas was equal to q 1 within a tolerance of 0.01. The results were averaged over 20 samples.…”

mentioning

confidence: 99%

Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations

2019

Self Cite

View full text Add to dashboard Cite

Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: While the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings. arXiv:1907.07578v3 [cond-mat.dis-nn]

show abstract

“…More precisely, the local (free) entropy of a certain configuration of the weights w * is defined as [14]:…”

Section: Replicated Systems and Overfittingmentioning

confidence: 99%

“…where L (r) tot is the total loss of the replica r. It is important at this stage to observe that the canonical physical description presupposes a noisy optimization process where the amount of noise is regulated by some inverse temperature β, while in this work (following ref. [14]) we will be relying on the noise provided by SGD instead, thereby using the mini-batch size and the learning rate as "equivalent" control parameters. Relatedly, we should also note that, although the interaction term is purely attractive, the replicas won't collapse unless the coupling coefficient λ is very large, due to the presence of noise in the optimization.…”

Section: Replicated Systems and Overfittingmentioning

confidence: 99%

Natural representation of composite data with replicated autoencoders

Negri

Davide

Baldassi

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the 'local entropy' rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge. AUTHOR SUMMARYExtracting compositional features from noisy data and identifying the corresponding generative models is a fundamental challenge across sciences. The composition of elementary features can have highly non-linear effects which makes them very hard to identify from experimental data. In biology, for instance, one challenge is to identify the key steps or components of molecular and cellular processes. Representative examples are the modeling of protein sequences as the composition of patterns influenced by phylogeny or the identification of gene clusters in which the presence of specific genes depends on the evolutionary history of the cell. Here we present an unsupervised machine learning technique for the analysis of compositional data which is based on entropic neural autoencoders. Our approach aims at finding deep autoencoders that are highly invariant with respect to perturbations in the inputs and in the parameters. The procedure is efficient to implement and we have validated it both on synthetic and protein sequence data, where it can be shown that the latent variables of the autoencoders are non trivially correlated with the true underlying generative processes. Our results suggests that the local entropy approach represents a general valuable tool for the extraction of compositional features in hard unsupervised learning problems.

show abstract

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

Cited by 134 publications

References 32 publications

A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists

A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists

Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations

Natural representation of composite data with replicated autoencoders

Contact Info

Product

Resources

About