Smooth strongly convex interpolation and exact worst-case performance of first-order methods

Taylor, Adrien; Hendrickx, Julien M.; Glineur, François

doi:10.1007/s10107-016-1009-3

Cited by 138 publications

(240 citation statements)

References 20 publications

Supporting

Mentioning

236

Contrasting

Order By: Relevance

“…Figure 1 presents a comparison of the worst-case bounds in the case κ = 100 for GFOM, for an SSEP-based method performing no line searches, the celebrated fast (or accelerated) gradient method (FGM) for smooth strongly convex minimization [36,Theorem 2.1.12], and the very recent triple momentum method (TMM) [52]. These worst-case bounds were derived numerically by solving the corresponding PEPs using the interpolation conditions presented in Example 1 (see [13,50] for details on the derivation of PEPs for fixed-step methods). Note that the bound for the SSEP method was generated for the form (20); bounds for the efficient form (21) behave almost exactly like the bounds for the form (20)-the difference could not be observed from this plot-and were therefore omitted from the comparison.…”

Section: Ssep-based Gradient Methods For Smooth Strongly Convex Minimimentioning

confidence: 99%

“…Among them, a systematic approach to lower bounds (which focuses on quadratic cases) is presented in by Arjevani et al in [1], a systematic use of control theory (via integral quadratic constraints) for developing upper bounds is presented by Lessard et al in [26], and the performance estimation approach, which aims at finding worst-case bounds was originally developed in [13] (see also surveys in [11] and [47]). Those methodologies are mostly presented as tools for performing worst-cases analyses (see the numerous examples in [11,19,47,48,50,51]), however, such techniques were also recently used to develop new methods with improved worst-case complexities. Among others, such an approach was used in [13,22] to devise a fixed-step method that attains the best possible worst-case performance for smooth convex minimization [12], and later in [14] to obtain a variant of Kelley's cutting plane method with the best possible worst-case guarantee for non-smooth convex minimization.…”

Section: Links With Systematic and Computer-assisted Approaches To Womentioning

confidence: 99%

“…For many classes of functions F , the condition "S is F -interpolable" can be expressed as a finite set of constraints on the elements of S. In these cases, we refer to this set of constraints as interpolation conditions for the class F . See [48,50] for a list of known interpolation conditions for different classes of functions, along with the corresponding proofs.…”

Section: Basic Definitionsmentioning

confidence: 99%

See 2 more Smart Citations

Efficient first-order methods for convex minimization: a constructive approach

Drori

Taylor

2019

Math. Program.

Self Cite

View full text Add to dashboard Cite

We describe a novel constructive technique for devising efficient first-order methods for a wide range of large-scale convex minimization settings, including smooth, non-smooth, and strongly convex minimization. The technique builds upon a certain variant of the conjugate gradient method to construct a family of methods such that a) all methods in the family share the same worst-case guarantee as the base conjugate gradient method, and b) the family includes a fixed-step first-order method. We demonstrate the effectiveness of the approach by deriving optimal methods for the smooth and non-smooth cases, including new methods that forego knowledge of the problem parameters at the cost of a one-dimensional line search per iteration, and a universal method for the union of these classes that requires a three-dimensional search per iteration. In the strongly convex case, we show how numerical tools can be used to perform the construction, and show that the resulting method offers an improved worst-case bound compared to Nesterov's celebrated fast gradient method. IntroductionConvex optimization plays a central role in many fields of applications, including optimal control, machine learning and signal processing. In particular, when a large number of variables are involved within a convex optimization problem, the use of first-order methods is more and more widespread due to their typically very attractive low computational cost per iteration. This low computational cost comes, however, at a price: first-order methods often suffer from potentially slow convergence speeds, making them appropriate mostly for obtaining low to medium accuracy solutions. Nevertheless, first-order methods remain the methods of choice in many applications and currently receive a lot of attention from the optimization community, which constantly aims at improving them.An effective and fruitful approach used for analyzing and comparing first-order methods is the study of their worst-case behavior through the black-box model. In this setting, methods are only allowed to gain information on the objective through an oracle, which provides the value and the gradient of the objective at selected points.

show abstract

Section: Ssep-based Gradient Methods For Smooth Strongly Convex Minimimentioning

confidence: 99%

Section: Links With Systematic and Computer-assisted Approaches To Womentioning

confidence: 99%

Section: Basic Definitionsmentioning

confidence: 99%

See 1 more Smart Citation

Efficient first-order methods for convex minimization: a constructive approach

Drori

Taylor

2019

Math. Program.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Now, we make a short inventory of the inequalities available to prove the different global convergence rates. Recent works on performance estimation of first-order methods (see [10,11]) guarantee that no other inequalities are needed in order to obtain the desired convergence results.…”

Section: Basic Inequalities Characterizing One Iteration Of Pgmmentioning

confidence: 99%

Exact Worst-Case Convergence Rates of the Proximal Gradient Method for Composite Convex Minimization

Taylor

Hendrickx

Glineur

2018

J Optim Theory Appl

Self Cite

View full text Add to dashboard Cite

We study the worst-case convergence rates of the proximal gradient method for minimizing the sum of a smooth strongly convex function and a non-smooth convex function, whose proximal operator is available.We establish the exact worst-case convergence rates of the proximal gradient method in this setting for any step size and for different standard performance measures: objective function accuracy, distance to optimality and residual gradient norm.The proof methodology relies on recent developments in performance estimation of first-order methods, based on semidefinite programming. In the case of the proximal gradient method, this methodology allows obtaining exact and non-asymptotic worst-case guarantees, that are conceptually very simple, although apparently new.On the way, we discuss how strong convexity can be replaced by weaker assumptions, while preserving the corresponding convergence rates. We also establish that the same fixed step size policy is optimal for all three performance measures. Finally, we extend recent results on the worst-case behavior of gradient descent with exact line search to the proximal case.

show abstract

“…One such question was raised in [23], where the authors consider the worstcase performance sup θ∈Θ F θ (x n ) − F θ (x θ ) of gradient-based algorithms over the set of continuously differentiable functions with Lipschitz-continuous gradients, and with a uniform upper bound on the Lipschitz constants. Subsequent work along the same lines can be found in [31,48].…”

Section: Learning An Optimization Solvermentioning

confidence: 95%

Data-Driven Nonsmooth Optimization

Banert¹,

Ringh²,

Adler³

et al. 2020

SIAM J. Optim.

View full text Add to dashboard Cite

In this work, we consider methods for solving large-scale optimization problems with a possibly nonsmooth objective function. The key idea is to first specify a class of optimization algorithms using a generic iterative scheme involving only linear operations and applications of proximal operators. This scheme contains many modern primal-dual first-order solvers like the Douglas-Rachford and hybrid gradient methods as special cases. Moreover, we show convergence to an optimal point for a new method which also belongs to this class. Next, we interpret the generic scheme as a neural network and use unsupervised training to learn the best set of parameters for a specific class of objective functions while imposing a fixed number of iterations. In contrast to other approaches of "learning to optimize", we present an approach which learns parameters only in the set of convergent schemes. As use cases, we consider optimization problems arising in tomographic reconstruction and image deconvolution, and in particular a family of total variation regularization problems. * equal contribution arXiv:1808.00946v1 [math.OC] 2 Aug 2018 X-ray computed tomography (CT) [40,41], magnetic resonance imaging (MRI) [18], and electron tomography [42].A key challenge is to handle the computational burden. In imaging, and especially so for three-dimensional imaging, the resulting optimization problem is very high-dimensional even after clever digitization and might involve more than one billion variables. Moreover, many regularizers that are popular in imaging (see Section 5), like those associated with sparsity, result in a nonsmooth objective function. These issues prevent usage of variational methods in time-critical applications, such as medical imaging in a clinical setting. Modern methods which aim at overcoming these obstacles are typically based on the proximal point algorithm [46] and operator splitting techniques, see e.g., [10, 12, 14-16, 20-22, 25, 29, 33, 34] and references therein.The main objective of the paper is to offer a computationally tractable approach for minimizing large-scale nondifferentiable, convex functions. The key idea is to "learn" how to optimize from training data, resulting in an iterative scheme that is optimal given a fixed number of steps, while its convergence properties can be analyzed. We will make this precise in Section 4.Similar ideas have been proposed previously in [8,27,35], but these approaches are either limited to specific classes of iterative schemes, like gradientdescent-like schemes [8,35] that are not applicable for nonsmooth optimization, or specialized to a specific class of regularizers as in [27], which limits the possible choices of regularizers and forward operators. The approach taken here leverages upon these ideas and yields a general framework for learning optimization algorithms that are applicable to solving optimization problems of the type (1.1), inspired by the proximal-type methods mentioned above.A key feature is to present a general formulation that includes several...

show abstract

Smooth strongly convex interpolation and exact worst-case performance of first-order methods

Cited by 138 publications

References 20 publications

Efficient first-order methods for convex minimization: a constructive approach

Efficient first-order methods for convex minimization: a constructive approach

Exact Worst-Case Convergence Rates of the Proximal Gradient Method for Composite Convex Minimization

Data-Driven Nonsmooth Optimization

Contact Info

Product

Resources

About