Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Bischl, Bernd; Binder, Martin; Lang, Michel; Pielok, Tobias; Richter, Jakob; Coors, Stefan; Thomas, Janek; Ullmann, Theresa; Becker, Marc; Boulesteix, Anne-Laure; Deng, Difan; Lindauer, Marius

doi:10.48550/arxiv.2107.05847

Cited by 19 publications

(29 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Hyperparameter Search Usually, a common part of the ML pipeline is to perform some sort of hyperparameter search. The corresponding tuning strategies still remains an open area of research (see Bischl et al, 2021 for a comprehensive overview), but the following rules of thumb exist: If there are very few parameters that can be searched exhaustively under the computation budget, grid search or Bayesian optimization can be applied. Otherwise, random search is preferred, as it explores the search space more efficiently (Bergstra & Bengio, 2012).…”

Section: Codebase and Modelsmentioning

confidence: 99%

Experimental Standards for Deep Learning in Natural Language Processing Research

Ulmer¹,

Bassignana²,

Müller-Eberstein³

et al. 2022

Preprint

View full text Add to dashboard Cite

The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, as with other fields employing DL techniques, there has been a lack of common experimental standards compared to more established disciplines. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in DL into a single, widely-applicable methodology. Following these best practices is crucial to strengthening experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.

show abstract

Section: Codebase and Modelsmentioning

confidence: 99%

Experimental Standards for Deep Learning in Natural Language Processing Research

Ulmer¹,

Bassignana²,

Müller-Eberstein³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Hyperparameter optimization (HPO) methods aim to identify a well-performing hyperparameter configuration (HPC) λ ∈ Λ for an ML algorithm I λ [1]. An ML learner or inducer I configured by hyperparameters λ ∈ Λ maps a data set D ∈ D to a model f , i.e., I : D × Λ → H, (D, λ) → f .…”

Section: Hyperparameter Optimizationmentioning

confidence: 99%

“…where λ * denotes the theoretical optimum and c maps an arbitrary HPC to (possibly multiple) target metrics. The classical HPO problem is defined as λ * ∈ arg min λ∈ Λ GE(I, J , ρ, λ), i.e., the goal is to minimize the estimated generalization error when I (learner), J (resampling splits), and ρ (performance measure) are fixed, see [1] for further details. Instead of optimizing only for predictive performance, other metrics such as model sparsity or computational efficiency of prediction (e.g., MACs and FLOPs or model size and memory usage) could be included, resulting in a multiobjective HPO problem [37][38][39][40][41].…”

Section: Hyperparameter Optimizationmentioning

confidence: 99%

“…Hyperparameter optimization (HPO) of machine learning (ML) models is a crucial step for achieving good predictive performance [1]. Over the last ten years, a large and still growing set of HPO tuning methods based on different principles has been developed [2][3][4][5][6][7][8], but their empirical evaluation and a proper understanding of when which variant should be used is still somewhat lacking.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

Pfisterer¹,

Schneider²,

Moosbauer³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

When developing and analyzing new hyperparameter optimization (HPO) methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we list desirable properties and requirements for such benchmarks and propose a new set of challenging and relevant multifidelity HPO benchmark problems motivated by these requirements. For this, we revisit the concept of surrogate-based benchmarks and empirically compare them to more widely-used tabular benchmarks, showing that the latter ones may induce bias in performance estimation and ranking of HPO methods. We present a new surrogate-based benchmark suite for multifidelity HPO methods consisting of 9 benchmark collections that constitute over 700 multifidelity HPO problems in total. All our benchmarks also allow for querying of multiple optimization targets, enabling the benchmarking of multi-objective HPO. We examine and compare our benchmark suite with respect to the defined requirements and show that our benchmarks provide viable additions to existing suites. * Equal contribution Preprint. Under review.

show abstract

“…Deep neural networks lie at the heart of many of the artificial intelligence applications that are ubiquitous in our society. Over the past several years, methods for training these networks have become more automatic [1,2,3,4,5] but still remain more an art than a science. This paper introduces the high-level concept of general cyclical training as another step in making it easier to optimally train neural networks.…”

Section: Introductionmentioning

confidence: 99%

General Cyclical Training of Neural Networks

Smith¹

2022

Preprint

View full text Add to dashboard Cite

This paper describes the principle of "General Cyclical Training" in machine learning, where training starts and ends with "easy training" and the "hard training" happens during the middle epochs. We propose several manifestations for training neural networks, including algorithmic examples (via hyper-parameters and loss functions), data-based examples, and model-based examples. Specifically, we introduce several novel techniques: cyclical weight decay, cyclical batch size, cyclical focal loss, cyclical softmax temperature, cyclical data augmentation, cyclical gradient clipping, and cyclical semi-supervised learning. In addition, we demonstrate that cyclical weight decay, cyclical softmax temperature, and cyclical gradient clipping (as three examples of this principle) are beneficial in the test accuracy performance of a trained model. Furthermore, we discuss model-based examples (such as pretraining and knowledge distillation) from the perspective of general cyclical training and recommend some changes to the typical training methodology. In summary, this paper defines the general cyclical training concept and discusses several specific ways in which this concept can be applied to training neural networks. In the spirit of reproducibility, the code used in our experiments is available at https://github.com/lnsmith54/CFL.

show abstract

Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges

Cited by 19 publications

References 80 publications

Experimental Standards for Deep Learning in Natural Language Processing Research

Experimental Standards for Deep Learning in Natural Language Processing Research

YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization

General Cyclical Training of Neural Networks

Contact Info

Product

Resources

About