2020
DOI: 10.1007/s13748-020-00218-y
|View full text |Cite
|
Sign up to set email alerts
|

A new approach for the vanishing gradient problem on sigmoid activation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 89 publications
(44 citation statements)
references
References 13 publications
0
35
0
Order By: Relevance
“…sigmoid and hyperbolic tangent activation functions). However, stacked neuron layers with these activation functions are often suffering from the vanishing and exploding gradient problems and, hence, not suitable for our purpose [38,39]. Meanwhile, the Rectified Linear Unit (ReLU) activation function is usually deployed instead as the default choice for multilayer perceptron and convolutional neural networks due to the generally good performance and faster learning capacity.…”
Section: Neural Network Design For Sobolev Training Of a Smooth Scalar Functional With Physical Constraintsmentioning
confidence: 99%
“…sigmoid and hyperbolic tangent activation functions). However, stacked neuron layers with these activation functions are often suffering from the vanishing and exploding gradient problems and, hence, not suitable for our purpose [38,39]. Meanwhile, the Rectified Linear Unit (ReLU) activation function is usually deployed instead as the default choice for multilayer perceptron and convolutional neural networks due to the generally good performance and faster learning capacity.…”
Section: Neural Network Design For Sobolev Training Of a Smooth Scalar Functional With Physical Constraintsmentioning
confidence: 99%
“…Firstly, the increase in the number of layers causes the increase of parameters in the network, which intensifies the storage difficulty and computational complexity of the model. Secondly, the increase in model depth may cause the risk of vanishing gradient [ 20 ], resulting in the inability to effectively update the parameters of the shallow convolution kernel. Thirdly, more training data is needed to prevent overfitting of the model [ 21 ].…”
Section: Problem Definitionmentioning
confidence: 99%
“…The motivation for using neural networks in this work, in particular LSTM, is due to its proven efficiency in Sentiment Analysis. Although RNNs have proved to be successful in several tasks, it is well-known that it suffers from a significant drawback, namely the vanishing-gradient problem [61], making it difficult for the network to learn longdistance dependencies. The use of memory units in the network helps to overcome this drawback.…”
Section: Long-short-term Memory (Lstm)mentioning
confidence: 99%