2020
DOI: 10.1142/s0218213020500104
|View full text |Cite
|
Sign up to set email alerts
|

Variance Counterbalancing for Stochastic Large-scale Learning

Abstract: Stochastic Gradient Descent (SGD) is perhaps the most frequently used method for large scale training. A common example is training a neural network over a large data set, which amounts to minimizing the corresponding mean squared error (MSE). Since the convergence of SGD is rather slow, acceleration techniques based on the notion of “Mini-Batches” have been developed. All of them however, mimicking SGD, impose diminishing step-sizes as a means to inhibit large variations in the MSE objective. In this article… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 8 publications
0
1
0
Order By: Relevance
“…Optimizer: It is an algorithm that helps to minimize the loss function during training. In our study, we have used the Adam optimizer as it helps the model to converge faster without getting stuck in a suboptimal solution [34]. 5.…”
Section: Hyperparameter Tuningmentioning
confidence: 99%
“…Optimizer: It is an algorithm that helps to minimize the loss function during training. In our study, we have used the Adam optimizer as it helps the model to converge faster without getting stuck in a suboptimal solution [34]. 5.…”
Section: Hyperparameter Tuningmentioning
confidence: 99%