“…The classical HPO problem is defined as λ * ∈ arg min λ∈ Λ GE(I, J , ρ, λ), i.e., the goal is to minimize the estimated generalization error when I (learner), J (resampling splits), and ρ (performance measure) are fixed, see [1] for further details. Instead of optimizing only for predictive performance, other metrics such as model sparsity or computational efficiency of prediction (e.g., MACs and FLOPs or model size and memory usage) could be included, resulting in a multiobjective HPO problem [37][38][39][40][41]. c(λ) is a black-box function, as it usually has no closed-form mathematical representation, and analytic gradient information is generally not available.…”