In neural network's Literature, "Hebbian learning" traditionally refers to the procedure by which the Hopfield model and its generalizations "store" archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred examples of these archetypes), in order to make its own representation of the unavailable archetypes. Here, given a sample of examples, we define a supervised learning protocol based on Hebb's rule and by which the Hopfield network can infer the archetypes. By an analytical inspection, we detect the correct control parameters (including size and quality of the dataset) that tune the system performance and we depict its phase diagram. We also prove that, for structureless datasets, the Hopfield model equipped with this supervised learning rule is equivalent to a restricted Boltzmann machine and this suggests an optimal and interpretable training routine. Finally, this approach is generalized to structured datasets: we highlight a ultrametric-like organization (reminiscent of replica-symmetry-breaking) in the analyzed datasets and, consequently, we introduce an additional "broken-replica hidden layer" for its (partial) disentanglement, which is shown to improve MNIST classification from 75\% to 95\%, and to offer a new perspective on deep architectures.
Hebb's learning traces its origin in Pavlov's classical conditioning; however, while the former has been extensively modeled in the past decades (e.g., by the Hopfield model and countless variations on theme), as for the latter, modeling has remained largely unaddressed so far. Furthermore, a mathematical bridge connecting these two pillars is totally lacking. The main difficulty toward this goal lies in the intrinsically different scales of the information involved: Pavlov's theory is about correlations between concepts that are (dynamically) stored in the synaptic matrix as exemplified by the celebrated experiment starring a dog and a ringing bell; conversely, Hebb's theory is about correlations between pairs of neurons as summarized by the famous statement that neurons that fire together wire together. In this letter, we rely on stochastic process theory to prove that as long as we keep neurons' and synapses' timescales largely split, Pavlov's mechanism spontaneously takes place and ultimately gives rise to synaptic weights that recover the Hebbian kernel.
The gap between the huge volumes of data needed to train artificial neural networks and the relatively small amount of data needed by their biological counterparts is a central puzzle in machine learning. Here, inspired by biological information-processing, we introduce a generalized Hopfield network where pairwise couplings between neurons are built according to Hebb's prescription for on-line learning and allow also for (suitably stylized) off-line sleeping mechanisms. Moreover, in order to retain a learning framework, here the patterns are not assumed to be available, instead, we let the network experience solely a dataset made of a sample of noisy examples for each pattern. We analyze the model by statistical-mechanics tools and we obtain a quantitative picture of its capabilities as functions of its control parameters: the resulting network is an associative memory for pattern recognition that learns from examples on-line, generalizes and optimizes its storage capacity by off-line sleeping. Remarkably, the sleeping mechanisms always significantly reduce (up to ≈ 90%) the dataset size required to correctly generalize, further, there are memory loads that are prohibitive to Hebbian networks without sleeping (no matter the size and quality of the provided examples), but that are easily handled by the present "rested" neural networks. Contents 1 Introduction 2 The Learning and Dreaming (LaD) model 2.1 The LaD neural network: definitions and generalities 2.1.1 The LaD neural network: cost function, partition function and statistical pressure 2.1.2 The LaD neural network: control parameters and order parameters 2.2 The LaD neural network: replica symmetric scenario 2.2.1 The LaD neural network: asymptotic behavior in the thermodynamic limit 2.2.2 The LaD neural network: emergent computational skills 2.2.3 The LaD neural network: the nature of the computational phase transitions 2.3 The LaD neural network: dataset savings3 Conclusions A Maximum entropy inference to obtain the LaD cost function B The decorrelation matrix and the dreaming time: a toy example C On the entropy of the dataset ρ D Guerra scheme for the LaD's quenched statistical pressure E Self-consistent equations F Noiseless limits of the self-consistent equations Bibliography
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.