2016
DOI: 10.1137/16m1085905
|View full text |Cite
|
Sign up to set email alerts
|

Optimization in High Dimensions via Accelerated, Parallel, and Proximal Coordinate Descent

Abstract: Abstract. We propose a new randomized coordinate descent method for minimizing the sum of convex functions, each of which depends on a small number of coordinates only. Our method (AP-PROX) is simultaneously Accelerated, Parallel, and PROXimal; this is the first time such a method has been proposed. In the special case when the number of processors is equal to the number of coordinates, the method converges at the rate 2ωLR 2 /(k + 1) 2 , where k is the iteration counter,ω is a data-weighted average degree of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 33 publications
0
11
0
Order By: Relevance
“…Note that, as for the iteration (4), in each step p m subproblems have to be solved but storage and update work increase. Remedies are available, see [6,23] for discussions on implementation issues. We have the following convergence result:…”
Section: Theoretical Resultsmentioning
confidence: 99%
“…Note that, as for the iteration (4), in each step p m subproblems have to be solved but storage and update work increase. Remedies are available, see [6,23] for discussions on implementation issues. We have the following convergence result:…”
Section: Theoretical Resultsmentioning
confidence: 99%
“…Besides a deterministic or greedy pick, we may also choose the next subproblem in a random fashion according to a probability distribution ρ on the set of subspaces, see [8] and the references cited therein. The analysis of such stochastic iterations has been a very active research topic in large-scale convex optimization, see [6] for a recent survey, but also in the area of machine learning and compressed sensing. Compared to the greedy approach, the cost for determining the next subspace is dramatically reduced to the cost of sampling the underlying probability distribution ρ.…”
Section: Introductionmentioning
confidence: 99%
“…Distributed optimization: In recent years, a lot of effort has been devoted to designing distributed first-order methods (Mahajan et al, 2013;Shamir and Srebro, 2014;Lee et al, 2017;Fercoq and Richtárik, 2016;Liu et al, 2014;Necoara and Clipici, 2016;Richtárik and Takáč, 2016;Liu et al, 2020), which only rely on gradient information of the objective function. However, first-order methods suffer from: (i) a dependence on a suitably defined condition number; (ii) spending more time on communication than on computation.…”
Section: Related Workmentioning
confidence: 99%