2019
DOI: 10.48550/arxiv.1910.11858
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

Abstract: Neural Architecture Search (NAS) has seen an explosion of research in the past few years. A variety of methods have been proposed to perform NAS, including reinforcement learning, Bayesian optimization with a Gaussian process model, evolutionary search, and gradient descent. In this work, we design a NAS algorithm that performs Bayesian optimization using a neural network model.We develop a path-based encoding scheme to featurize the neural architectures that are used to train the neural network model. This st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

3
111
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(114 citation statements)
references
References 30 publications
(90 reference statements)
3
111
0
Order By: Relevance
“…It uses Gaussian Processes to learn the posterior distribution of the objective function, which is then used to construct an acquisition function to determine the next trial (Snoek et al, 2012). BO is widely used in NAS (White et al, 2019), deep learning hyperparameter tuning (Golovin et al, 2017;Shahriari et al, 2015), system optimization (Lagar-Cavilla et al, 2019;Dalibard et al, 2017), model selection (Malkomes et al, 2016), transfer learning (Ruder & Plank, 2017) and many more (Archetti & Candelieri, 2019;Srinivas et al, 2009;Hutter et al, 2011;Snoek et al, 2012;Wilson et al, 2016) for optimizing with limited computing and time budgets.…”
Section: Nas For Student Modelsmentioning
confidence: 99%
“…It uses Gaussian Processes to learn the posterior distribution of the objective function, which is then used to construct an acquisition function to determine the next trial (Snoek et al, 2012). BO is widely used in NAS (White et al, 2019), deep learning hyperparameter tuning (Golovin et al, 2017;Shahriari et al, 2015), system optimization (Lagar-Cavilla et al, 2019;Dalibard et al, 2017), model selection (Malkomes et al, 2016), transfer learning (Ruder & Plank, 2017) and many more (Archetti & Candelieri, 2019;Srinivas et al, 2009;Hutter et al, 2011;Snoek et al, 2012;Wilson et al, 2016) for optimizing with limited computing and time budgets.…”
Section: Nas For Student Modelsmentioning
confidence: 99%
“…We then benchmarked the following HPO methods on the real, surrogate, and tabular benchmarks: Random search (RS), Bayesian optimization (BO), and Hyperband (HB, [5]). BO is configured with algorithm surrogate model either a Gaussian process (BO GP), ensemble of feed-forward neural net- works (NN, [51]) or random forest (BO RF, [52]) and acquisition function optimizer either Nelder-Mead/exhaustive search 2 (* DF [53]) or random search (* RS). See Appendix A.3 for more details.…”
Section: Empirical Investigationsmentioning
confidence: 99%
“…Neural architecture search (NAS) methods can be categorized along three dimensions (Elsken et al, 2019a): search space, search strategy, and performance estimation strategy. Focusing on search strategy, popular methods are given by Bayesian optimization (BO, e.g., Bergstra et al 2013;Domhan et al 2015;Mendoza et al 2016;Kandasamy et al 2018;White et al 2019), evolutionary methods (e.g., Miller et al 1989;Liu et al 2017;Real et al 2017Elsken et al 2019b), reinforcement learning (RL, e.g., Zoph and Le 2017;, and gradient-based algorithms (e.g., Liu et al 2019;Pham et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…Within the BO framework, BANANAS (White et al, 2019) has emerged as one stateof-the-art algorithm (White et al, 2019;Siems et al, 2020;Guerrero-Viu et al, 2021;White et al, 2021). The two main components of BANANAS are a (truncated) path encoding, where architectures represented as directed acyclic graphs (DAG) are encoded based on the possible paths through that graph, and an ensemble of feed-forward neural networks as surrogate model.…”
Section: Introductionmentioning
confidence: 99%