2022
DOI: 10.1016/j.neunet.2022.05.030
|View full text |Cite
|
Sign up to set email alerts
|

Successfully and efficiently training deep multi-layer perceptrons with logistic activation function simply requires initializing the weights with an appropriate negative mean

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(5 citation statements)
references
References 30 publications
0
5
0
Order By: Relevance
“…A major challenge in RNNs is the vanishing gradient problem. The gradients of the loss function concerning the parameters of the network become extremely small as they are back-propagated from the output layer to the earlier layers during training 45 . Long Short-Term Memory (LSTM) networks are RNNs specialized to handle the vanishing gradient problem by having a complex memory cell instead of simple recurrent connections 46 .…”
Section: Addressing Temporal Dependence and Feature Selectionmentioning
confidence: 99%
“…A major challenge in RNNs is the vanishing gradient problem. The gradients of the loss function concerning the parameters of the network become extremely small as they are back-propagated from the output layer to the earlier layers during training 45 . Long Short-Term Memory (LSTM) networks are RNNs specialized to handle the vanishing gradient problem by having a complex memory cell instead of simple recurrent connections 46 .…”
Section: Addressing Temporal Dependence and Feature Selectionmentioning
confidence: 99%
“…We refer to this benchmark strategy as ''LR'' (Linear Regression). As other benchmark strategies, we use Xavier initialization (Glorot & Bengio, 2010) in all layers, including the last layer, Kaiming initialization (He et al, 2015), LeCun initialization (LeCun et al, 2012), Yilmaz & Poli initialization (Yilmaz & Poli, 2022) and orthogonal initialization (Saxe et al, 2013).…”
Section: Benchmark Strategiesmentioning
confidence: 99%
“…The initial loss of the other three benchmark strategies is large, with a validation RMSE around 90 after the weight initialization. Moreover, just as with the Xavier initialization strategy, the neural network gets stuck in a local optimum with the Lecun (Le-Cun et al, 2012), orthogonal (Saxe et al, 2013) and Yilmaz & Poli (Yilmaz & Poli, 2022) initialization strategies. However, in the end, the neural network converges to a similar final training, validation and test accuracy for all considered benchmark strategies: Yilmax & Poli initialization even has the same final test accuracy.…”
Section: Tablementioning
confidence: 99%
“…For these reasons, LSTM networks are distinguished from other methods as an alternative machine learning method. They are especially used for long term time series analysis and have emerged as a solution method for these problems [20].…”
Section: Introductionmentioning
confidence: 99%