2019
DOI: 10.48550/arxiv.1901.08987
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Dar Gilboa,
Bo Chang,
Minmin Chen
et al.

Abstract: Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from instabilities when trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
35
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 18 publications
(36 citation statements)
references
References 15 publications
1
35
0
Order By: Relevance
“…Turning to the titular edge of chaos, we are inspired by the aforementioned works [15,16,[27][28][29][30] examining criticality in various deep network architectures. However, while many of these papers used the phrase "mean-field theory", they did not actually rely on any MFT analysis: as mentioned above, Gaussianity arises simply as a consequence of the central limit theorem (CLT).…”
Section: Relation To Other Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Turning to the titular edge of chaos, we are inspired by the aforementioned works [15,16,[27][28][29][30] examining criticality in various deep network architectures. However, while many of these papers used the phrase "mean-field theory", they did not actually rely on any MFT analysis: as mentioned above, Gaussianity arises simply as a consequence of the central limit theorem (CLT).…”
Section: Relation To Other Workmentioning
confidence: 99%
“…As discussed in the introduction and in more detail below, the result holds only at weak 't Hooft coupling, 29 and the perturbative expansion assumes T /N < 1; the latter is the regime of practical relevance for modern deep neural networks [2]. 30 It would be interesting to explore these connections to O(N ) theory in more detail, e.g., to see whether the analysis can be extended beyond the perturbative (weak-coupling) regime we consider here.…”
Section: Perturbative Correctionsmentioning
confidence: 99%
See 2 more Smart Citations
“…It has since appeared in many fields ranging from theoretical neuroscience and neurophysiology [23][24][25], to biological and complex systems [26,27], and was also explored in an early model of bioplausible neural networks in [28]. More recently, [17,18,[29][30][31][32] demonstrated that networks initialized at criticality are trainable to far greater depths than those lying further into either phase. To see this, one identifies a correlation length that sets the scale at which correlations between local degrees of freedom -e.g., the activations of two different neurons, or the spins of two different magnetic dipoles -decay with separation or depth through the network.…”
Section: Introductionmentioning
confidence: 99%