Fast Bayesian hyperparameter optimization on large datasets

Klein, Aaron; Falkner, Stefan; Bartels, Simon; Hennig, Philipp; Hutter, Frank

doi:10.1214/17-ejs1335si

Cited by 107 publications

(120 citation statements)

References 12 publications

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…are the holdout and crossvalidation error for a user-given loss function (such as misclassification rate); see Bischl et al [16] for an overview of validation protocols. Several strategies for reducing the evaluation time have been proposed: It is possible to only test machine learning algorithms on a subset of folds [149], only on a subset of data [78,102,147], or for a small amount of iterations; we will discuss some of these strategies in more detail in Sect. 1.4.…”

Section: Popular Choices For the Validation Protocol V(• • • •)mentioning

confidence: 99%

“…Multi-task Bayesian optimization (and the methods presented in the previous subsection) requires an upfront specification of a set of fidelities. This can be suboptimal since these can be misspecified [74,78] and because the number of fidelities that can be handled is low (usually five or less). Therefore, and in order to exploit the typically smooth dependence on the fidelity (such as, e.g., size of the data subset used), it often yields better results to treat the fidelity as continuous (and, e.g., choose a continuous percentage of the full data set to evaluate a configuration on), trading off the information gain and the time required for evaluation [78].…”

Section: Adaptive Choices Of Fidelitiesmentioning

confidence: 99%

“…This can be suboptimal since these can be misspecified [74,78] and because the number of fidelities that can be handled is low (usually five or less). Therefore, and in order to exploit the typically smooth dependence on the fidelity (such as, e.g., size of the data subset used), it often yields better results to treat the fidelity as continuous (and, e.g., choose a continuous percentage of the full data set to evaluate a configuration on), trading off the information gain and the time required for evaluation [78]. To exploit the domain knowledge that performance typically improves with more data, with diminishing returns, a special kernel can be constructed for the data subsets [78].…”

Section: Adaptive Choices Of Fidelitiesmentioning

confidence: 99%

“…Therefore, and in order to exploit the typically smooth dependence on the fidelity (such as, e.g., size of the data subset used), it often yields better results to treat the fidelity as continuous (and, e.g., choose a continuous percentage of the full data set to evaluate a configuration on), trading off the information gain and the time required for evaluation [78]. To exploit the domain knowledge that performance typically improves with more data, with diminishing returns, a special kernel can be constructed for the data subsets [78]. This generalization of multi-task Bayesian optimization improves performance and can achieve a 10-100 fold speedup compared to blackbox Bayesian optimization.…”

Section: Adaptive Choices Of Fidelitiesmentioning

confidence: 99%

See 3 more Smart Citations

Hyperparameter Optimization

Feurer¹,

Hutter²

2019

The Springer Series on Challenges in Machine Learning

Self Cite

887

496

View full text Add to dashboard Cite

Recent interest in complex and computationally expensive machine learning models with many hyperparameters, such as automated machine learning (AutoML) frameworks and deep neural networks, has resulted in a resurgence of research on hyperparameter optimization (HPO). In this chapter, we give an overview of the most prominent approaches for HPO. We first discuss blackbox function optimization methods based on model-free methods and Bayesian optimization. Since the high computational demand of many modern machine learning applications renders pure blackbox optimization extremely costly, we next focus on modern multi-fidelity methods that use (much) cheaper variants of the blackbox function to approximately assess the quality of hyperparameter settings. Lastly, we point to open problems and future research directions. 1.1 Introduction Every machine learning system has hyperparameters, and the most basic task in automated machine learning (AutoML) is to automatically set these hyperparameters to optimize performance. Especially recent deep neural networks crucially depend on a wide range of hyperparameter choices about the neural network's architecture, regularization, and optimization. Automated hyperparameter optimization (HPO) has several important use cases; it can • reduce the human effort necessary for applying machine learning. This is particularly important in the context of AutoML.

show abstract

Section: Popular Choices For the Validation Protocol V(• • • •)mentioning

confidence: 99%

Section: Adaptive Choices Of Fidelitiesmentioning

confidence: 99%

Section: Adaptive Choices Of Fidelitiesmentioning

confidence: 99%

Section: Adaptive Choices Of Fidelitiesmentioning

confidence: 99%

See 2 more Smart Citations

Hyperparameter Optimization

Feurer¹,

Hutter²

2019

The Springer Series on Challenges in Machine Learning

Self Cite

887

496

View full text Add to dashboard Cite

show abstract

“…Therefore, BOA can overcome the disruption of building blocks in genetic algorithms. The BOA has advantages in the optimization of machine learning algorithm hyperparameters, because of its faster search speed and fewer iteration compared to traditional search algorithms [28][29][30]. In this study, the BOA is employed to optimize the parameters of Random Forest (which is the basic model, see Section 2.3 for details) for traffic incident duration prediction, in order to achieve better prediction results.…”

Section: Methodsmentioning

confidence: 99%

A Hybrid Method for Traffic Incident Duration Prediction Using BOA-Optimized Random Forest Combined with Neighborhood Components Analysis

Shang

Tan

Gao

et al. 2019

Journal of Advanced Transportation

View full text Add to dashboard Cite

Predicting traffic incident duration is important for effective and real-time traffic incident management (TIM), which helps to minimize traffic congestion, environmental pollution, and secondary incident related to this incident. Traffic incident duration prediction methods often use more input variables to obtain better prediction results. However, the problems that available variables are limited at the beginning of an incident and how to select significant variables are ignored to some extent. In this paper, a novel prediction method named NCA-BOA-RF is proposed using the Neighborhood Components Analysis (NCA) and the Bayesian Optimization Algorithm (BOA)-optimized Random Forest (RF) model. Firstly, the NCA is applied to select feature variables for traffic incident duration. Then, RF model is trained based on the training set constructed using feature variables, and the BOA is employed to optimize the RF parameters. Finally, confusion matrix is introduced to measure the optimized RF model performance and compare with other methods. In addition, the performance is also tested in the absence of some feature variables. The results demonstrate that the proposed method not only has high accuracy, but also exhibits excellent reliability and robustness.

show abstract

Complimentary Computational Cues for Water Electrocatalysis: A DFT and ML Perspective

Badreldin,

Bouhali,

Abdel‐Wahab

2023

Adv Funct Materials

View full text Add to dashboard Cite

Heterogenous electrocatalysis continues to witness propagating interest in a plethora of non‐limiting electrochemical fields. Of which, water electrolysis has moved from lab‐scale systems to commercial electrolyzers albeit high dependence on historic benchmark noble‐metal based catalysts is still the status quo. Notwithstanding, advances in material groups such as single‐atom catalysts, perovskites, high‐entropy alloys, among others continue to see an increased interest toward utilization in next‐generation electrolyzers. To that end, progress in electrocatalyst discovery techniques is revolutionized through synergistically combining density functional theory (DFT) and machine learning (ML) techniques. The success of ML herein depends on numerous interlinked factors such as the algorithm employed, data availability and accuracy, with descriptors being critical to encapsulate physicochemical perspectives. Historic utilization of ML frameworks in areas other than materials discovery has left a lack of standardization toward appropriating suitable methods of high‐throughput DFT, ML approaches, and feature engineering that bridge the gap between activity‐structure‐electronic relationships. This review outlines needed considerations toward DFT calculations, important criteria during filtering out screened surfaces, and synergistic approaches toward utilizing theoretical and/or experimental datasets for formulating effective ML frameworks. Persisting challenges, perspectives, and recommendations thereof are highlighted to expedite and generalize future work pertaining to high‐volume water electrocatalysis discovery.

show abstract

Fast Bayesian hyperparameter optimization on large datasets

Cited by 107 publications

References 12 publications

Hyperparameter Optimization

Hyperparameter Optimization

A Hybrid Method for Traffic Incident Duration Prediction Using BOA-Optimized Random Forest Combined with Neighborhood Components Analysis

Complimentary Computational Cues for Water Electrocatalysis: A DFT and ML Perspective

Contact Info

Product

Resources

About