We describe a novel constructive technique for devising efficient first-order methods for a wide range of large-scale convex minimization settings, including smooth, non-smooth, and strongly convex minimization. The technique builds upon a certain variant of the conjugate gradient method to construct a family of methods such that a) all methods in the family share the same worst-case guarantee as the base conjugate gradient method, and b) the family includes a fixed-step first-order method. We demonstrate the effectiveness of the approach by deriving optimal methods for the smooth and non-smooth cases, including new methods that forego knowledge of the problem parameters at the cost of a one-dimensional line search per iteration, and a universal method for the union of these classes that requires a three-dimensional search per iteration. In the strongly convex case, we show how numerical tools can be used to perform the construction, and show that the resulting method offers an improved worst-case bound compared to Nesterov's celebrated fast gradient method.
IntroductionConvex optimization plays a central role in many fields of applications, including optimal control, machine learning and signal processing. In particular, when a large number of variables are involved within a convex optimization problem, the use of first-order methods is more and more widespread due to their typically very attractive low computational cost per iteration. This low computational cost comes, however, at a price: first-order methods often suffer from potentially slow convergence speeds, making them appropriate mostly for obtaining low to medium accuracy solutions. Nevertheless, first-order methods remain the methods of choice in many applications and currently receive a lot of attention from the optimization community, which constantly aims at improving them.An effective and fruitful approach used for analyzing and comparing first-order methods is the study of their worst-case behavior through the black-box model. In this setting, methods are only allowed to gain information on the objective through an oracle, which provides the value and the gradient of the objective at selected points.