2023
DOI: 10.3390/math11030682
|View full text |Cite
|
Sign up to set email alerts
|

Recent Advances in Stochastic Gradient Descent in Deep Learning

Abstract: In the age of artificial intelligence, the best approach to handling huge amounts of data is a tremendously motivating and hard problem. Among machine learning models, stochastic gradient descent (SGD) is not only simple but also very effective. This study provides a detailed analysis of contemporary state-of-the-art deep learning applications, such as natural language processing (NLP), visual data processing, and voice and audio processing. Following that, this study introduces several versions of SGD and its… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 72 publications
(31 citation statements)
references
References 47 publications
0
30
0
1
Order By: Relevance
“…The learning rate was 0.005. Optimisation utilised the stochastic gradient descent method ( Tian et al, 2023 ) as this is computationally faster and can converge quicker than other optimisation algorithms. The loss function used was cross-entropy loss.…”
Section: Methodsmentioning
confidence: 99%
“…The learning rate was 0.005. Optimisation utilised the stochastic gradient descent method ( Tian et al, 2023 ) as this is computationally faster and can converge quicker than other optimisation algorithms. The loss function used was cross-entropy loss.…”
Section: Methodsmentioning
confidence: 99%
“…To better understand the REMR algorithm and its main steps, its pseudo-codes are introduced in Algorithm 1. REMR does not really include an approximation process by hyperparameter optimization, which is usually performed via gradient descent algorithms; this is more like the submerging of deep networks [11]. Therefore, the traditional equations of the loss function will not be useful in evaluating its convergence behavior at every round.…”
Section: 𝑦 ̃𝑘+1 = 𝑓(𝑥 𝑘+1 )|𝑘 = 1 → 𝑚mentioning
confidence: 99%
“…Since the advent of deep learning and neural networks, there have been studies of numerical optimization algorithms from the perspective of machine learning. In particular, the first-order algorithms, like gradient descent and such, have been widely used in the machine learning and data analysis research [1,29,37,39,55,59,61]. While first-order algorithms enjoy from being memory efficient, low-cost per iteration and simple to implement, they are also notoriously difficult to fine tune and slow in convergence, especially when the functions are not well-conditioned.…”
Section: Related Workmentioning
confidence: 99%