2018
DOI: 10.48550/arxiv.1806.01520
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On $\ell_p$-hyperparameter Learning via Bilevel Nonsmooth Optimization

Abstract: We propose a bilevel optimization strategy for selecting the best hyperparameter value for the nonsmooth ℓ p regularizer with 0 < p ≤ 1. The concerned bilevel optimization problem has a nonsmooth, possibly nonconvex, ℓ p -regularized problem as the lower-level problem. Despite the recent popularity of nonconvex ℓ p regularizer and the usefulness of bilevel optimization for selecting hyperparameters, algorithms for such bilevel problems have not been studied because of the difficulty of ℓ p regularizer. We firs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
32
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(36 citation statements)
references
References 31 publications
4
32
0
Order By: Relevance
“…The bilevel optimization has been widely applied to various machine learning applications. Hyper-parameter optimization (Lorraine and Duvenaud 2018;Okuno, Takeda, and Kawana 2018;Franceschi et al 2018) uses bilevel optimization extensively. Besides, the idea of bilevel optimization has also been applied to meta learning (Zintgraf et al 2019;Song et al 2019;Soh, Cho, and Cho 2020), neural architecture search (Liu, Simonyan, and Yang 2018;Wong et al 2018;Xu et al 2019), adversarial learning (Tian et al 2020;Yin et al 2020;Gao et al 2020), deep reinforcement learning (Yang et al 2018;Tschiatschek et al 2019), etc.…”
Section: Related Workmentioning
confidence: 99%
“…The bilevel optimization has been widely applied to various machine learning applications. Hyper-parameter optimization (Lorraine and Duvenaud 2018;Okuno, Takeda, and Kawana 2018;Franceschi et al 2018) uses bilevel optimization extensively. Besides, the idea of bilevel optimization has also been applied to meta learning (Zintgraf et al 2019;Song et al 2019;Soh, Cho, and Cho 2020), neural architecture search (Liu, Simonyan, and Yang 2018;Wong et al 2018;Xu et al 2019), adversarial learning (Tian et al 2020;Yin et al 2020;Gao et al 2020), deep reinforcement learning (Yang et al 2018;Tschiatschek et al 2019), etc.…”
Section: Related Workmentioning
confidence: 99%
“…where ζ t i is the i-th element of ζ t in the t-th fold. The linear programs (LPs) (15), for t = 1, . .…”
Section: Combining With (−Amentioning
confidence: 99%
“…By eliminating λ t and w t with w t = (B t ) ⊤ α t in (4c), we get the reduced KKT conditions for problem (15) with…”
Section: Combining With (−Amentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, they showed that the 1 2 -norm has better denoising performance than the 1 -norm. Recently, [23] considered the bilevel program (3) with the function R 1 (ω) := n i=1 |ω i | p (0 < p ≤ 1) (i.e. the p -regularizer) by employing smoothing method via the twice continuously differentiable function…”
Section: Introductionmentioning
confidence: 99%