1999
DOI: 10.1007/s100510050889
|View full text |Cite
|
Sign up to set email alerts
|

Statistical physics and practical training of soft-committee machines

Abstract: Equilibrium states of large layered neural networks with differentiable activation function and a single, linear output unit are investigated using the replica formalism. The quenched free energy of a student network with a very large number of hidden units learning a rule of perfectly matching complexity is calculated analytically. The system undergoes a first order phase transition from unspecialized to specialized student configurations at a critical size of the training set. Computer simulations of learnin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
23
0

Year Published

1999
1999
2021
2021

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(25 citation statements)
references
References 24 publications
(49 reference statements)
2
23
0
Order By: Relevance
“…The term Soft Committee Machine (SCM) has been coined for feedforward neural networks with sigmoidal activations in a single hidden layer and a linear output unit (see, for instance, [ 30 , 31 , 32 , 33 , 34 , 35 , 36 , 55 , 56 ]). Its structure resembles that of a (crisp) committee machine with binary threshold hidden units, where the network’s response is given by their majority vote (see [ 5 , 6 , 7 ] and references therein).…”
Section: Models and Mathematical Analysismentioning
confidence: 99%
“…The term Soft Committee Machine (SCM) has been coined for feedforward neural networks with sigmoidal activations in a single hidden layer and a linear output unit (see, for instance, [ 30 , 31 , 32 , 33 , 34 , 35 , 36 , 55 , 56 ]). Its structure resembles that of a (crisp) committee machine with binary threshold hidden units, where the network’s response is given by their majority vote (see [ 5 , 6 , 7 ] and references therein).…”
Section: Models and Mathematical Analysismentioning
confidence: 99%
“…Here G r is an effective Hamiltonian and the entropy term s = (1/2) ln det C, where C is the K(n + 1) × K(n + 1)-dimensional matrix of the order parameters [10]. We have further introduced the rescaled number of examples, α = P/(NK).…”
mentioning
confidence: 99%
“…The remaining order parameters Q0 , Q1 , δ 0 , δ 1 and m parametrize the distribution of overlaps between the weight vectors of different students. Note that as in [10], using the saddle point equations for the free energy, one may analytically eliminate the unspecialized order parameters R, Q, Q0 and Q1 .…”
mentioning
confidence: 99%
“…The term Soft Committee Machine (SCM) has been established for shallow feedforward neural networks with a single hidden layer and a linear output unit, see for instance [2,8,9,11,26,42,44,45,49]. Its structure resembles that of a (crisp) committee machine with binary threshold hidden units, where the network output is given by their majority vote, see [4,19,53] and references therein.…”
Section: Layered Neural Networkmentioning
confidence: 99%