2008
DOI: 10.1109/tnn.2008.2000391
|View full text |Cite
|
Sign up to set email alerts
|

Dynamics of Learning in Multilayer Perceptrons Near Singularities

Abstract: The dynamical behavior of learning is known to be very slow for the multilayer perceptron, being often trapped in the "plateau." It has been recently understood that this is due to the singularity in the parameter space of perceptrons, in which trajectories of learning are drawn. The space is Riemannian from the point of view of information geometry and contains singular regions where the Riemannian metric or the Fisher information matrix degenerates. This paper analyzes the dynamics of learning in a neighborh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
17
0

Year Published

2010
2010
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(18 citation statements)
references
References 21 publications
1
17
0
Order By: Relevance
“…Recently, it has turned out that the singularity has a negative effect on learning dynamics in the real-valued neural networks [12,13,15]. That is, the hierarchical structure or a symmetric property on exchange of weights of the the real-valued neural networks have singular points.…”
Section: Problem On the Singularitymentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, it has turned out that the singularity has a negative effect on learning dynamics in the real-valued neural networks [12,13,15]. That is, the hierarchical structure or a symmetric property on exchange of weights of the the real-valued neural networks have singular points.…”
Section: Problem On the Singularitymentioning
confidence: 99%
“…On one hand, the researches on the singularity of the learning machines with a hierarchical structure have progressed in the past several years [12][13][14][15]. It has turned out that the singularity has a negative effect on learning dynamics in the learning machines such as real-valued neural networks and gaussian mixture models.…”
Section: Introductionmentioning
confidence: 99%
“…Although a generalized linear model allows limited non-linearities, it enjoys tractable and consistent estimation procedures without problems of local minima (Paninski, 2004). Identifying more complex non-linear models like hierarchical neural networks from physiological data tends to be harder due to problems like local minima and plateaus in the error surface (Amari et al, 2006; Cousseau et al, 2008; Wei and Amari, 2008; Wei et al, 2008). …”
Section: Adaptive Optimization For Sensory Model Estimationmentioning
confidence: 99%
“…In a recent article, Amari, Park and Ozeki [1] argued that the singular structure is ubiquitous, covering a wide range of models from MLPs, Gaussian mixtures, ARMA time-series and many others, and the learning dynamics will be heavily influenced by the singularities. The special case of MLPs with unidentifiable parameters has been discussed both numerically [12] and analytically [4], and some interesting behaviours have been revealed in these studies.…”
Section: Introductionmentioning
confidence: 97%