2020
DOI: 10.48550/arxiv.2007.07869
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Gradient-based Hyperparameter Optimization Over Long Horizons

Abstract: Gradient-based hyperparameter optimization is an attractive way to perform metalearning across a distribution of tasks, or improve the performance of an optimizer on a single task. However, this approach has been unpopular for tasks requiring long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn hyperparameters online or split the horizon into smaller chunks. However, this introduces greediness which comes with a large performance drop, sinc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…As future work, our myopic approach could be extended to longer horizons by incorporating the principles of recent work by Micaelli & Storkey (2020), which presents a promising research direction. We also depend inconveniently on meta-hyperparameters, which are not substantially tuned.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…As future work, our myopic approach could be extended to longer horizons by incorporating the principles of recent work by Micaelli & Storkey (2020), which presents a promising research direction. We also depend inconveniently on meta-hyperparameters, which are not substantially tuned.…”
Section: Discussionmentioning
confidence: 99%
“…Modern developments include theoretical reformulations of bilevel optimisation to improve performance (Liu et al, 2020;Li et al, 2020), optimising distinct hyperparameters for each model parameter (Lorraine et al, 2019;Jie et al, 2020), and computing forward-mode hypergradient averages using more exact techniques than we do (Micaelli & Storkey, 2020). Although these approaches increase computational efficiency and the range of tunable parameters, achieving both benefits at once remains challenging.…”
Section: Reinterpretation Of Iterative Optimisationmentioning
confidence: 99%
“…There have also been attempts to use forward-mode gradient accumulation for hyperparameter optimization (Franceschi et al, 2017), which is only tractable when the hyperparameter dimensionality is very small (e.g., < 10). Most gradient-based approaches perform online, joint optimization over the model parameters and hyperparameters; a notable exception is Micaelli & Storkey (2020), that performs offline updates after each full inner optimization run. Black-box approaches typically do not scale well beyond ∼ 10 hyperparameters.…”
Section: ĝEs-singlementioning
confidence: 99%
“…Gradient-based methods were developed to access this structure, making them scalable to tune millions of hyperparameters within deep architectures [10,29]. Gradients have been implemented extensively within HO, including to learning rates [9,30,31], regularization coefficients [10,32,11], and neural architecture search [33]. Another class of algorithms view the HO problem as a gray-box by considering the inner optimization structure, but relax the need for the meta-objective to be differentiable: population-based training [34], hypernetwork-based HO [12,35], and persistent evolution strategies (PES) [13] are examples of gray-box approaches.…”
Section: Related Workmentioning
confidence: 99%