1998
DOI: 10.1209/epl/i1998-00466-6
|View full text |Cite
|
Sign up to set email alerts
|

Phase transitions in soft-committee machines

Abstract: Equilibrium statistical physics is applied to layered neural networks with differentiable activation functions. A first analysis of off-line learning in soft-committee machines with a finite number (K) of hidden units learning a perfectly matching rule is performed. Our results are exact in the limit of high training temperatures (β → 0). For K = 2 we find a second order phase transition from unspecialized to specialized student configurations at a critical size P of the training set, whereas for K ≥ 3 the tra… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

3
11
0

Year Published

1999
1999
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 14 publications
(14 citation statements)
references
References 25 publications
3
11
0
Order By: Relevance
“…orthogonal space are reflected by non-trivial configurations of fQ ij g. The underlying cluster structure is not at all detected as long as e a is smaller than the critical value e a c . This parallels findings for supervised learning in neural networks with two hidden units [5] or unsupervised learning scenarios [10,16]. Above e a c , prototypes begin to align with the clusters and the system becomes specialized, i.e.…”
Section: Specialization Transition In the Training Processsupporting
confidence: 72%
See 2 more Smart Citations
“…orthogonal space are reflected by non-trivial configurations of fQ ij g. The underlying cluster structure is not at all detected as long as e a is smaller than the critical value e a c . This parallels findings for supervised learning in neural networks with two hidden units [5] or unsupervised learning scenarios [10,16]. Above e a c , prototypes begin to align with the clusters and the system becomes specialized, i.e.…”
Section: Specialization Transition In the Training Processsupporting
confidence: 72%
“…Similar effects of ''retarded learning'' have been studied in several models and learning scenarios earlier, e.g. [5,6,8,10,16].…”
Section: Introductionmentioning
confidence: 56%
See 1 more Smart Citation
“…This type of network consists of a layer with K hidden units, all of which are connected with the entire input, and the total output of the net is proportional to the sum of their states. Previous studies have addressed large soft committees (K → ∞) with binary weights within the so-called Annealed Approximation [14] or networks with finite K in the limit of high training temperature [20].…”
mentioning
confidence: 99%
“…We assume an isotropic teacher with orthonormal weight vectors: B j · B k = N δ jk for all j, k. The training of a perfectly matching student with outputs σ(ξ) = K j=1 g(x j )/ √ K is considered, where the arguments x j = J j · ξ/ √ N are defined through adaptive weights J j with J 2 j = N. The particular choice of the hidden unit activation function, g(x) = erf(x/ √ 2), simplifies the mathematical treatment to a large extent [7,8,20]. We expect, however, that our results apply qualitatively to a large class of sigmoidal functions including the very similar and frequently used hyperbolic tangent.…”
mentioning
confidence: 99%