Exact Solution for On-Line Learning in Multilayer Neural Networks

Saad, David; Solla, Sara A.

doi:10.1103/physrevlett.74.4337

Cited by 161 publications

(262 citation statements)

References 5 publications

Supporting

Mentioning

260

Contrasting

Order By: Relevance

“…Since having a discrete teacher is merely a special case, not using the knowledge that the teacher is confined to a discrete set of values gives the well-known results; an exponential decay in the case of continuous rule (on-line learning [12,13]) and a power law decay in the case of binary rule (on-line and off-line learning [14][15][16][17][18]). The way to gain from the knowledge of the discrete nature of the weights is in the center of our work, and it is based on having in addition a discrete student W S derived from the continuous one using the following clipping procedure.…”

Section: B Dynamics Of the Weightsmentioning

confidence: 99%

Training a perceptron in a discrete weight space

Rosen-Zvi

Kanter

2001

Phys. Rev. E

View full text Add to dashboard Cite

On-line and batch learning of a perceptron in a discrete weight space, where each weight can take 2L + 1 different values, are examined analytically and numerically. The learning algorithm is based on the training of the continuous perceptron and prediction following the clipped weights. The learning is described by a new set of order parameters, composed of the overlaps between the teacher and the continuous/clipped students. Different scenarios are examined among them on-line learning with discrete/continuous transfer functions and off-line Hebb learning. The generalization error of the clipped weights decays asymptotically as exp(−Kα 2 )/exp(−e |λ|α ) in the case of on-line learning with binary/continuous activation functions, respectively, where α is the number of examples divided by N, the size of the input vector and K is a positive constant that decays linearly with 1/L. For finite N and L, a perfect agreement between the discrete student and the teacher is obtained for α ∝ L ln(N L). A crossover to the generalization error ∝ 1/α, characterized continuous weights with binary output, is obtained for synaptic depth L > O( √ N ).

show abstract

Section: B Dynamics Of the Weightsmentioning

confidence: 99%

Training a perceptron in a discrete weight space

Rosen-Zvi

Kanter

2001

Phys. Rev. E

View full text Add to dashboard Cite

show abstract

“…The study of online backpropagation as put forward by Biehl and Schwarze [5] and later developed in [6,7] has permitted the analytical understanding of several properties of the dynamics of the learning process. The most striking feature being the existence of learning plateaux or symmetric phases which signal learning stages where the information available to the student and the form in which it is used do not permit breaking the permutation symmetry among the hidden nodes.…”

mentioning

confidence: 99%

Functional optimization of online algorithms in multilayer neural networks

Vicente

Caticha

1997

J. Phys. A: Math. Gen.

View full text Add to dashboard Cite

We study the online dynamics of learning in fully connected soft committee machines in the student-teacher scenario. The locally optimal modulation function, which determines the learning algorithm, is obtained from a variational argument in such a manner as to maximise the average generalisation error decay per example. Simulations results for the resulting algorithm are presented for a few cases. The symmetric phase plateaux are found to be vastly reduced in comparison to those found when online backpropagation algorithms are used. A discussion of the implementation of these ideas as practical algorithms is given.Key words: neural networks, generalisation, backpropagation, learning algorithms . PACS. #: 87.10.e+10, 05.90.+m, 64.60.Cn Learning how learning occurs in artificial systems has caught the attention of the Statistical Mechanics community in the last decade. This interest was ignited by several reasons, among them, the invention of efficient learning-from-examples methods such as backpropagation, that permit learning in computationally complex machines, to the realisation that ideas from disordered systems, in particular spin glasses, could be applied to the study of attractor as well as feedforward neural networks and to the generalised interest in complex systems with rugged energy landscapes.The main results from the Statistical Mechanics (see e.g. [1-3] ) approach have almost invariantly been obtained in the thermodynamic limit and have benefited from the powerful techniques used to calculate the averages over the disorder introduced by the random nature of the examples.Among several possible approaches to machine learning, online learning [4] has been the subject of an intense research effort due to several factors. In this scheme, examples are used only once, thereby avoiding the need for expensive memory resources, typical of offline methods. This, however, doesn't translate necessarily into poor performance since efficient methods can be devised that have performance comparable to the memory based ones. Furthermore, learning sequentially from single examples has a greater biological flavor than offline processing. While efficiency, computational economy and biological relevance may be the most relevant factors, the theoretical possibility of rather complete analytical studies has also played an important role. If each one of these factors is, by itself, sufficiently important to make online learning an attractive scheme, together they combine to give a most compelling argument for its thorough study.In this letter we present results of the optimisation of online supervised learning in a model consisting of a fully connected multilayer feedforward neural network, in what has become known as the student-teacher scenario. The type of result we present here brings together two separate lines of research that have been recently pursued by several groups.The study of online backpropagation as put forward by Biehl and Schwarze [5] and later developed in [6,7] has permitted the analytical understand...

show abstract

“…A common choice for the transfer function is g(x) = erf(x/ √ 2). With this specific choice, the averaging in the equations of motion (30,31,32) can be performed analytically for general K and M [22,23,24] Independent of the particular choice of learning algorithms a general problem in two-layered networks is caused by the inherent permutation symmetry: The i-th input branch of the adaptive network (24) does not necessarily specialize on the i-th branch in the network (25). Without loss of generality, however, one can relabel the dynamical variables such as if this were indeed the case.…”

Section: XImentioning

confidence: 99%

Statistical Mechanics of On-line Learning

Biehl

Caticha

Riegler

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We introduce and discuss the application of statistical physics concepts in the context of on-line machine learning processes. The consideration of typical properties of very large systems allows to perfom averages over the randomness contained in the sequence of training data. It yields an exact mathematical description of the training dynamics in model scenarios. We present the basic concepts and results of the approach in terms of several examples, including the learning of linear separable rules, the training of multilayer neural networks, and Learning Vector Quantization.

show abstract

Exact Solution for On-Line Learning in Multilayer Neural Networks

Cited by 161 publications

References 5 publications

Training a perceptron in a discrete weight space

Training a perceptron in a discrete weight space

Functional optimization of online algorithms in multilayer neural networks

Statistical Mechanics of On-line Learning

Contact Info

Product

Resources

About