2022
DOI: 10.3390/math10060863
|View full text |Cite
|
Sign up to set email alerts
|

aSGD: Stochastic Gradient Descent with Adaptive Batch Size for Every Parameter

Abstract: In recent years, deep neural networks (DNN) have been widely used in many fields. Lots of effort has been put into training due to their numerous parameters in a deep network. Some complex optimizers with many hyperparameters have been utilized to accelerate the process of network training and improve its generalization ability. It often is a trial-and-error process to tune these hyperparameters in a complex optimizer. In this paper, we analyze the different roles of training samples on a parameter update, vis… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…SGD has emerged as a standard method for optimizing various types of deep neural networks, primarily because of its capacity to escape local minima like ASGD (verifiable with the best training times in Table 4) [48,49], and its efficiency for large-scale datasets, making it ideal for linear classification problems related to our database's nature [50]. Additionally, in recent works, SGD has proven to be an excellent optimizing algorithm for suspended solids and turbidity estimation using CNN, presenting R 2 value of 0.931 [26],…”
Section: Discussionmentioning
confidence: 99%
“…SGD has emerged as a standard method for optimizing various types of deep neural networks, primarily because of its capacity to escape local minima like ASGD (verifiable with the best training times in Table 4) [48,49], and its efficiency for large-scale datasets, making it ideal for linear classification problems related to our database's nature [50]. Additionally, in recent works, SGD has proven to be an excellent optimizing algorithm for suspended solids and turbidity estimation using CNN, presenting R 2 value of 0.931 [26],…”
Section: Discussionmentioning
confidence: 99%
“…• To address the problem of optimization in determining the best privacy parameter ϵ for Blockchain technology that incorporates differential privacy mechanisms, we suggest that researchers develop new methods for blockchain to achieve security based on optimization algorithms and differential privacy when industrial data is publicly published. We can, for example, utilize an optimization approach like Stochastic Gradient Descent (SGD) [186], which may aid in determining the ideal privacy parameter ϵ by executing parameter changes for each training sample while ensuring data usefulness. Additionally, businesses might consider developing and designing solutions that will assist in the integration and adaptation of this new type of security and privacy mechanisms in IIoT industries.…”
Section: B Proposed Solutionsmentioning
confidence: 99%