2021
DOI: 10.48550/arxiv.2105.07874
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal Convergence Rates for the Proximal Bundle Method

Abstract: We study convergence rates of the classic proximal bundle method for a variety of nonsmooth convex optimization problems. We show that, without any modification, this algorithm adapts to converge faster in the presence of smoothness or a Hölder growth condition. Our analysis reveals that with a constant stepsize, the bundle method is adaptive, yet it exhibits suboptimal convergence rates. We overcome this shortcoming by proposing nonconstant stepsize schemes with optimal rates. These schemes use function infor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(10 citation statements)
references
References 27 publications
0
10
0
Order By: Relevance
“…However, relative to both null and serious steps, priors works analyzing the convergence of bundle methods (see, for example, Kiwiel (2000); Du and Ruszczynski (2017); Diaz and Grimmer (2021)) have only derived sublinear guarantees for global convergence-even when the objective is strongly convex. In contrast, we show in this paper that the Survey Descent iteration, at least in the case of a strongly convex, max-of-smooth function, achieves a local linear convergence rate.…”
Section: Relation To Bundle Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, relative to both null and serious steps, priors works analyzing the convergence of bundle methods (see, for example, Kiwiel (2000); Du and Ruszczynski (2017); Diaz and Grimmer (2021)) have only derived sublinear guarantees for global convergence-even when the objective is strongly convex. In contrast, we show in this paper that the Survey Descent iteration, at least in the case of a strongly convex, max-of-smooth function, achieves a local linear convergence rate.…”
Section: Relation To Bundle Methodsmentioning
confidence: 99%
“…Juditsky and Nemirovski (2011)). Within the well-studied realm of first-order bundle methods, in particular, theoretical guarantees have remained sublinear relative to the total number of null and serious steps (Kiwiel, 2000;Du and Ruszczynski, 2017;Diaz and Grimmer, 2021)-even in the presence of desirable properties such as δ-strongly convex objectives. For comparison, in the smooth setting, GD (and its projected and proximal variants) possesses well-recognized theoretical guarantees of linear convergence on L-smooth and δ-strongly convex objectives (Beck, 2017, Theorem 10.29).…”
Section: Linear Convergence and Nonsmooth Objectivesmentioning
confidence: 99%
“…where the penultimate inequality follows from (18) and the last inequality follows from P ker(A i ) ⊥ (y i − ŷ0 ) ≤ γ y i − ŷi and the bound y i − ŷi ≤ y i − ŷ0 . Rearranging, we arrive at the desired conclusion.…”
Section: Proof Of Theorem 22mentioning
confidence: 99%
“…Bundle methods often perform well in practice and their convergence/complexity theory is understood in several settings [23,25,33,38,39,47,54,64]. Most relevantly for this work, on sharp convex functions, variants of the bundle method converge superlinearly relative to the number of serious steps [50] and converge linearly relative to both serious and null steps [18].…”
Section: Introductionmentioning
confidence: 99%
“…for a large range of prox stepsizes λ. Since 2C-PB and MC-PB methods do not rely on (L f , M f , µ), a sharper iteration-complexity bound can be obtained for them by replacing µ and (M f , L f ) in (5) by μ and ( Mf , Lf ), respectively, where μ is the largest µ such that h − µ • 2 /2 is convex and ( Mf , Lf ) is the unique pair which minimizes M 2 f + εL f over the set of pairs (M f , L f ) satisfying the (M f , L f )-hybrid condition of f ′ . Moreover, even though this sharper complexity bound can not be shown for 1C-PB, Section 5 presents an adaptive version of this variant where τ in (3), instead of being chosen as a function of (L f , M f , µ), is adaptively searched so as to satisfy a key inequality condition.…”
Section: Introductionmentioning
confidence: 99%