2020
DOI: 10.48550/arxiv.2001.06472
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient descent with momentum --- to accelerate or to super-accelerate?

Abstract: We consider gradient descent with 'momentum', a widely used method for loss function minimization in machine learning. This method is often used with 'Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this 'acceleration'by using the gradient at an estimated position several steps ahead rather than just one step ahead. How far one looks … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…We curated a training dataset comprising pathogenic mutations, genetic variants with clinical significance plus benign missense variants from Online Mendelian Inheritance in Man (OMIM) 28 , and ClinVar datasets 202 We implemented the model and training algorithms using TensorFlow to implement best practices for data automation, model tracking, performance monitoring, and model retraining 86 . We used a stochastic gradient descent with momentum algorithm 214 to update the model's parameters at an initial learning rate of 7e-5 (momentum=0.9) 215 . We applied early stopping with hinge loss 216 for training algorithms on datasets with loss function optimization as a metric to avoid overfitting and experimented with different architectures to study the model's accuracy.…”
Section: Pg-gwas Is Accurate and Stable During Trainingmentioning
confidence: 99%
See 1 more Smart Citation
“…We curated a training dataset comprising pathogenic mutations, genetic variants with clinical significance plus benign missense variants from Online Mendelian Inheritance in Man (OMIM) 28 , and ClinVar datasets 202 We implemented the model and training algorithms using TensorFlow to implement best practices for data automation, model tracking, performance monitoring, and model retraining 86 . We used a stochastic gradient descent with momentum algorithm 214 to update the model's parameters at an initial learning rate of 7e-5 (momentum=0.9) 215 . We applied early stopping with hinge loss 216 for training algorithms on datasets with loss function optimization as a metric to avoid overfitting and experimented with different architectures to study the model's accuracy.…”
Section: Pg-gwas Is Accurate and Stable During Trainingmentioning
confidence: 99%
“…Provided that the NES is not only affected by the deleterious effect of a genetic variant, we expected to observe that some patients with longer CAG repeats receive a higher NES. This difference is partly explainable by the stochastic nature of trained networks to predict the deleterious effect of genetic variants 214,219 . More importantly, the NES is updated by pooling information from the entire network (i.e., the deleterious scores of neighboring nodes in the graph), so it is not independent of other variants in the genome.…”
Section: Precise Graph-based Annotationmentioning
confidence: 99%