1998
DOI: 10.1142/s0218488598000094
|View full text |Cite
|
Sign up to set email alerts
|

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

Abstract: Recurrent nets are in principle capable to store past inputs to produce the currently desired output. Because of this property recurrent nets are used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps, e.g. between relevant inputs and desired outputs. In this case, however, gradient based learning methods take too much time. The extremely increased learning time arises because the error vanishes as it gets propagated back. In this artic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
1,065
0
12

Year Published

2016
2016
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 2,352 publications
(1,079 citation statements)
references
References 7 publications
2
1,065
0
12
Order By: Relevance
“…Thus, training such a network becomes quite difficult, where the vanishing gradient problem will arise. In order to overcome the problem, LSTM was proposed in 1997 [48,49] and has been improved during the recent rapid development of deep learning technology [50]. It is a building unit for layers of RNN, which is able to remember short-term memory that lasts for a long period of time.…”
Section: Lstmmentioning
confidence: 99%
“…Thus, training such a network becomes quite difficult, where the vanishing gradient problem will arise. In order to overcome the problem, LSTM was proposed in 1997 [48,49] and has been improved during the recent rapid development of deep learning technology [50]. It is a building unit for layers of RNN, which is able to remember short-term memory that lasts for a long period of time.…”
Section: Lstmmentioning
confidence: 99%
“…Therefore, 33 it is common to implement the recurrent layer using an enhanced RNN version to compute h t -such as Long Short-term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) or Gated-Recurrent Unit (GRU) (Cho et al, 2014)-, which add an explicit gated mechanism to the traditional RNN in order to control the preservation of information in the hidden state over very long sequences.…”
Section: Rnn-based Language Modelsmentioning
confidence: 99%
“…This dependency on previous decisions make RNNs suitable to process sequences conditioning each output on its memory, also being able to process different sequence lengths with the same 9th ISCA Speech Synthesis Workshop 13-15 Sep 2016, Sunnyvale, USA number of static inputs and outputs. In this work we used Long Short Term Memory (LSTM) units instead of simple RNN units as they cope better with the vanishing gradient problems during training [14], and they also maintain long range dependencies better than conventional RNNs because of their gating mechanisms that control the information flows in and out of the layer without corrupting useful information in the further past [5].…”
Section: Recurrent Neural Network Reviewmentioning
confidence: 99%