2022
DOI: 10.3390/app122311976
|View full text |Cite
|
Sign up to set email alerts
|

On the Relative Impact of Optimizers on Convolutional Neural Networks with Varying Depth and Width for Image Classification

Abstract: The continued increase in computing resources is one key factor that is allowing deep learning researchers to scale, design and train new and complex convolutional neural network (CNN) architectures in terms of varying width, depth, or both width and depth to improve performance for a variety of problems. The contributions of this study include an uncovering of how different optimization algorithms impact CNN architectural setups with variations in width, depth, and both width/depth. Specifically in this study… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…The suggests that extracted features and the classifier were invariant against internal implementation and thus suitable for the task. It is worth pointing out that, on the contrary, selecting different loss functions and optimizers in a CNN-based DL could vastly affect its performance [21][22]. Finally, automated kernel scaling was activated for all kernels.…”
Section: Resultsmentioning
confidence: 99%
“…The suggests that extracted features and the classifier were invariant against internal implementation and thus suitable for the task. It is worth pointing out that, on the contrary, selecting different loss functions and optimizers in a CNN-based DL could vastly affect its performance [21][22]. Finally, automated kernel scaling was activated for all kernels.…”
Section: Resultsmentioning
confidence: 99%
“…Among the most commonly used optimizers in the various prediction and classification cases, SGD, RMSprop, Adadelta, and Adam were selected for study. Therefore, we expanded our analysis using Adam, Adadelta, RMSprop, and SGD optimizers [10,[27][28][29][33][34][35][36]. We considered the merits of each optimizer, including the SGD [33], RMSprop [29], Adam [37], and Adadelta [29] formulae.…”
Section: Optimizer Learning Rate and Batch Sizementioning
confidence: 99%
“…P is an argument responsible for keeping the spatial sizes fixed after the convolution operation by adding columns and rows of zero values. P has two types valid (without padding) and the same (with zero padding) [16,17]. Moreover, D is a hyper-parameter that adjusts the moving averages.…”
Section: Ssd's Hyper-parametersmentioning
confidence: 99%