Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems

Nesterov, Yurii

doi:10.1137/100802001

Cited by 1,012 publications

(1,286 citation statements)

References 4 publications

Supporting

Mentioning

1,236

Contrasting

Unclassified

Order By: Relevance

“…For this particular choice (non-asymptotic) convergence rates were only recently derived in [2], although the convergence of the method was extensively studied in the literature under various assumptions [13,3]. Instead of using a deterministic cyclic order, randomized strategies were proposed in [14,12,16] for choosing a block to update at each iteration of the BCGD method. At iteration k, an index i k is generated randomly according to the probability distribution vector p ∈ ∆ c .…”

Section: Randomized Block Coordinate Gradient Descentmentioning

confidence: 99%

“…At iteration k, an index i k is generated randomly according to the probability distribution vector p ∈ ∆ c . In [14] the distribution vector was chosen as…”

Section: Randomized Block Coordinate Gradient Descentmentioning

confidence: 99%

“…The above stochastic results from [14] for minimizing convex differentiable functions were generalized in [16] for minimizing the sum of a smooth convex function and a block-separable convex function. …”

Section: Algorithm 2: Random Block Coordinate Gradient Descent (Rbcgd)mentioning

confidence: 99%

“…Despite these observations, the strong convexity of the dual function ψ ε opens the way for various accelerations along the lines of [14,16,18,17]. We omit this here, however.…”

Section: Algorithm 3: Regularised Rbsk (Regrbsk)mentioning

confidence: 99%

See 3 more Smart Citations

Randomized Sparse Block Kaczmarz as Randomized Dual Block-Coordinate Descent

Petra

2015

Analele Universitatii "Ovidius" Constanta - Seria Matematica

View full text Add to dashboard Cite

We show that the Sparse Kaczmarz method is a particular instance of the coordinate gradient method applied to an unconstrained dual problem corresponding to a regularized 1-minimization problem subject to linear constraints. Based on this observation and recent theoretical work concerning the convergence analysis and corresponding convergence rates for the randomized block coordinate gradient descent method, we derive block versions and consider randomized ordering of blocks of equations. Convergence in expectation is thus obtained as a byproduct. By smoothing the 1-objective we obtain a strongly convex dual which opens the way to various acceleration schemes.

show abstract

Section: Randomized Block Coordinate Gradient Descentmentioning

confidence: 99%

“…At iteration k, an index i k is generated randomly according to the probability distribution vector p ∈ ∆ c . In [14] the distribution vector was chosen as…”

Section: Randomized Block Coordinate Gradient Descentmentioning

confidence: 99%

Section: Algorithm 2: Random Block Coordinate Gradient Descent (Rbcgd)mentioning

confidence: 99%

“…Despite these observations, the strong convexity of the dual function ψ ε opens the way for various accelerations along the lines of [14,16,18,17]. We omit this here, however.…”

Section: Algorithm 3: Regularised Rbsk (Regrbsk)mentioning

confidence: 99%

See 2 more Smart Citations

Randomized Sparse Block Kaczmarz as Randomized Dual Block-Coordinate Descent

Petra

2015

Analele Universitatii "Ovidius" Constanta - Seria Matematica

View full text Add to dashboard Cite

show abstract

“…Randomized coordinate descent has been shown to be competitive with the classical gradient descent method, in the sense that it requires less work per iteration, but a comparable number of iterations to converge [8]. In this section, we demonstrate that a similar property holds for asynchronous incremental block-coordinate descent: if the amount of work required to evaluate a partial gradient is proportional to its block size, then incremental block-coordinate descent can always be expected to be more efficient than a corresponding incremental gradient descent algorithm.…”

Section: Efficiency Comparison With Asynchronous Incremental Gramentioning

confidence: 99%

Asynchronous incremental block-coordinate descent

Aytekin

Feyzmahdavian

Johansson

2014

2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton)

View full text Add to dashboard Cite

Abstract-This paper studies a flexible algorithm for minimizing a sum of component functions, each of which depends on a large number of decision variables. Such formulations appear naturally in "big data" applications, where each function describes the loss estimated using the data available at a specific machine, and the number of features under consideration is huge. In our algorithm, a coordinator updates a global iterate based on delayed partial gradients of the individual objective functions with respect to blocks of coordinates. Delayed incremental gradient and delayed coordinate descent algorithms are obtained as special cases. Under the assumption of strong convexity and block coordinate-wise Lipschitz continuous partial gradients, we show that the algorithm converges linearly to a ball around the optimal value. Contrary to related proposals in the literature, our algorithm is delay-insensitive: it converges for any bounded information delay, and its step-size parameter can be chosen independently of the maximum delay bound.

show abstract

Supervised Learning

Liu

2021

Wiley StatsRef: Statistics Reference Online

View full text Add to dashboard Cite

Supervised learning is an important area in machine learning. In practice, many problems can be solved using supervised learning techniques to deal with the corresponding covariate–response data. One general goal is to find a model that predicts the response from the covariates well. We focus on supervised learning methods that can be formulated into the optimization of “loss + penalty.” In particular, the loss term keeps the fidelity of the resulting model to the data, while the penalty term penalizing the complexity can prevent the fitted model from overfitting. In this article, we review some commonly used supervised learning methods under this framework. We mainly focus on the statistical models and computational algorithms.

show abstract

Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems

Cited by 1,012 publications

References 4 publications

Randomized Sparse Block Kaczmarz as Randomized Dual Block-Coordinate Descent

Randomized Sparse Block Kaczmarz as Randomized Dual Block-Coordinate Descent

Asynchronous incremental block-coordinate descent

Supervised Learning

Contact Info

Product

Resources

About