The Vanishing Gradient Problem During Learning Recurrent Neural Nets  and Problem Solutions

Hochreiter, Sepp

doi:10.1142/s0218488598000094

Cited by 2,352 publications

(1,079 citation statements)

References 7 publications

Supporting

Mentioning

1,065

Contrasting

Unclassified

Order By: Relevance

“…Thus, training such a network becomes quite difficult, where the vanishing gradient problem will arise. In order to overcome the problem, LSTM was proposed in 1997 [48,49] and has been improved during the recent rapid development of deep learning technology [50]. It is a building unit for layers of RNN, which is able to remember short-term memory that lasts for a long period of time.…”

Section: Lstmmentioning

confidence: 99%

Ensemble Recurrent Neural Network Based Probabilistic Wind Speed Forecasting Approach

et al. 2018

View full text Add to dashboard Cite

Abstract:Wind energy is a commonly utilized renewable energy source, due to its merits of extensive distribution and rich reserves. However, as wind speed fluctuates violently and uncertainly at all times, wind power integration may affect the security and stability of power system. In this study, we propose an ensemble model for probabilistic wind speed forecasting. It consists of wavelet threshold denoising (WTD), recurrent neural network (RNN) and adaptive neuro fuzzy inference system (ANFIS). Firstly, WTD smooths the wind speed series in order to better capture its variation trend. Secondly, RNNs with different architectures are trained on the denoising datasets, operating as sub-models for point wind speed forecasting. Thirdly, ANFIS is innovatively established as the top layer of the entire ensemble model to compute the final point prediction result, in order to take full advantages of a limited number of deep-learning-based sub-models. Lastly, variances are obtained from sub-models and then prediction intervals of probabilistic forecasting can be calculated, where the variances inventively consist of modeling and forecasting uncertainties. The proposed ensemble model is established and verified on less than one-hour-ahead ultra-short-term wind speed forecasting. We compare it with other soft computing models. The results indicate the feasibility and superiority of the proposed model in both point and probabilistic wind speed forecasting.

show abstract

Section: Lstmmentioning

confidence: 99%

Ensemble Recurrent Neural Network Based Probabilistic Wind Speed Forecasting Approach

et al. 2018

View full text Add to dashboard Cite

show abstract

“…Therefore, 33 it is common to implement the recurrent layer using an enhanced RNN version to compute h t -such as Long Short-term Memory (LSTM) (Hochreiter and Schmidhuber, 1997) or Gated-Recurrent Unit (GRU) (Cho et al, 2014)-, which add an explicit gated mechanism to the traditional RNN in order to control the preservation of information in the hidden state over very long sequences.…”

Section: Rnn-based Language Modelsmentioning

confidence: 99%

Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)

2017

View full text Add to dashboard Cite

“…This dependency on previous decisions make RNNs suitable to process sequences conditioning each output on its memory, also being able to process different sequence lengths with the same 9th ISCA Speech Synthesis Workshop 13-15 Sep 2016, Sunnyvale, USA number of static inputs and outputs. In this work we used Long Short Term Memory (LSTM) units instead of simple RNN units as they cope better with the vanishing gradient problems during training [14], and they also maintain long range dependencies better than conventional RNNs because of their gating mechanisms that control the information flows in and out of the layer without corrupting useful information in the further past [5].…”

Section: Recurrent Neural Network Reviewmentioning

confidence: 99%

Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model

Pascual¹,

Bonafonte²

2016

9th ISCA Workshop on Speech Synthesis Workshop (SSW 9)

View full text Add to dashboard Cite

Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker interpolation by adding a new output layer (α-layer) on top of the multi-output branches. An identifying code is injected into the layer together with acoustic features of many speakers. Experiments show that the α-layer can effectively learn to interpolate the acoustic features between speakers.

show abstract

The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions

Cited by 2,352 publications

References 7 publications

Ensemble Recurrent Neural Network Based Probabilistic Wind Speed Forecasting Approach

Ensemble Recurrent Neural Network Based Probabilistic Wind Speed Forecasting Approach

Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG 2017)

Multi-output RNN-LSTM for multiple speaker speech synthesis with α-interpolation model

Contact Info

Product

Resources

About