2020
DOI: 10.1109/tnnls.2019.2955777
|View full text |Cite
|
Sign up to set email alerts
|

diffGrad: An Optimization Method for Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
108
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 201 publications
(108 citation statements)
references
References 25 publications
0
108
0
Order By: Relevance
“…Although the ImageNet base dataset does not include teeth images, various studies have shown that the fine-tuning of a pretrained network with several images of ImageNet helps improve the performance in disease-related problem learning using medical images [25][26][27] . The network was trained using the cross-entropy loss function and adaptive moment estimation (Adam) optimizer 28 with a learning rate of 1e-5 and a batch size of 32. We trained the network for 40,000 iterations and validated it using validation data every 1000 iterations with a classification accuracy metric to determine whether to stop the training.…”
Section: Methodsmentioning
confidence: 99%
“…Although the ImageNet base dataset does not include teeth images, various studies have shown that the fine-tuning of a pretrained network with several images of ImageNet helps improve the performance in disease-related problem learning using medical images [25][26][27] . The network was trained using the cross-entropy loss function and adaptive moment estimation (Adam) optimizer 28 with a learning rate of 1e-5 and a batch size of 32. We trained the network for 40,000 iterations and validated it using validation data every 1000 iterations with a classification accuracy metric to determine whether to stop the training.…”
Section: Methodsmentioning
confidence: 99%
“…Here, kernel size and kernel count are the most influential factors relative to recognition or detection performance. The hyperparameters other than the kernel size and kernel count in Table 3 are already reported the ranges of values close to optimization through various studies [ 32 , 33 , 34 ].…”
Section: Related Researchmentioning
confidence: 90%
“…We compare t-Adam mainly with Adam, but also with another robust gradient descent algorithm, such as RoAdam [21], and also present the comparison between some popular or recent optimization methods (in majority, variants of Adam, i.e. AdaBound [18], AdamW [31], DiffGrad [32], RAdam [33], PAdam [34], Yogi [35], and LaProp [36]) and their t-versions. 1 Note that we are not exhaustive in our selection and that the t-momentum can be integrated in other momentum-based optimization methods.…”
Section: Methodsmentioning
confidence: 99%
“…Proof: First, we start by noticing that the basic bound of the regret from the convergence proof by Reddi et al [8] also holds for t-Adam, that is, (32) where…”
Section: A Proof Of Theoremmentioning
confidence: 99%