2017
DOI: 10.48550/arxiv.1703.04782
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Online Learning Rate Adaptation with Hypergradient Descent

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(20 citation statements)
references
References 0 publications
0
20
0
Order By: Relevance
“…MLDG [30] introduces MAML to domain generalization by modifying the objective function. Alpha MAML [31] combines MAML with an online hyperparameter adaptation approach [32], thus avoiding tuning the learning rates. MAML++ [33] modifies MAML in several aspects and improves its stability and performances.…”
Section: B Gradient Based Meta-learningmentioning
confidence: 99%
“…MLDG [30] introduces MAML to domain generalization by modifying the objective function. Alpha MAML [31] combines MAML with an online hyperparameter adaptation approach [32], thus avoiding tuning the learning rates. MAML++ [33] modifies MAML in several aspects and improves its stability and performances.…”
Section: B Gradient Based Meta-learningmentioning
confidence: 99%
“…We thus do not store network weights at multiple time steps, so gradient-based HPO becomes possible on previously-intractable large-scale problems. In essence, we develop an approximation to online hypergradient descent (Baydin et al, 2018).…”
Section: Hyperparameter Updatesmentioning
confidence: 99%
“…We demonstrate our algorithm handles a range of hyperparameter initialisations and datasets, improving test loss after a single training episode ('one pass'). Relaxing differentiation-through-optimisation (Domke, 2012) and hypergradient descent's (Baydin et al, 2018) exactness allows us to improve computational and memory efficiency. Our scalable one-pass method improves performance from arbitrary hyperparameter initialisations, and could be augmented with a further search over those initialisations if desired.…”
Section: Introductionmentioning
confidence: 99%
“…One interesting direction is to simultaneously optimize configurations and parameters, such methods have been recently explored in NAS [46], [201] and automated searching step-size for SGD [202], which have shown to be much more efficient than previous state-of-the-arts.…”
Section: Techniquesmentioning
confidence: 99%