2017 International Artificial Intelligence and Data Processing Symposium (IDAP) 2017
DOI: 10.1109/idap.2017.8090299
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of the stochastic gradient descent based optimization techniques

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 73 publications
(29 citation statements)
references
References 1 publication
0
24
0
2
Order By: Relevance
“…Adadelta optimization was used, and it is widely used in the field of deep learning among gradient descent-based optimization methods [ 42 ]. Adadelta is an extension of Adagrad with the aim of minimizing the intense and monotonically declining learning rate [ 43 , 44 ], and it limits the window of the accumulated past gradients to a fixed size rather than accumulating all past squared gradients [ 43 ]. The vector of a diagonal matrix ( with the decaying average over past squared gradients was represented as shown in Equation (8): …”
Section: Methodsmentioning
confidence: 99%
“…Adadelta optimization was used, and it is widely used in the field of deep learning among gradient descent-based optimization methods [ 42 ]. Adadelta is an extension of Adagrad with the aim of minimizing the intense and monotonically declining learning rate [ 43 , 44 ], and it limits the window of the accumulated past gradients to a fixed size rather than accumulating all past squared gradients [ 43 ]. The vector of a diagonal matrix ( with the decaying average over past squared gradients was represented as shown in Equation (8): …”
Section: Methodsmentioning
confidence: 99%
“…Since the training sample data is large, the mini‐batch gradient descent (MBGD) algorithm is used as the optimization algorithm, which can reduce the calculation cost and increase the operation speed. Moreover, all the models are trained using the RMSprop optimizer …”
Section: Artificial Neural Network Approaches and Model For Short‐termentioning
confidence: 99%
“…The training parameters of all the CNNs are the same. The learning rate is 0.01, total training steps are 50,000 and the optimizer is root mean square prop (RMSProp) [28]. The results are shown in Table 2.…”
Section: Event Recognitionmentioning
confidence: 99%