2016
DOI: 10.1080/10556788.2016.1190361
|View full text |Cite
|
Sign up to set email alerts
|

Coordinate descent with arbitrary sampling II: expected separable overapproximation

Abstract: The design and complexity analysis of randomized coordinate descent methods, and in particular of variants which update a random subset (sampling) of coordinates in each iteration, depends on the notion of expected separable overapproximation (ESO). This refers to an inequality involving the objective function and the sampling, capturing in a compact way certain smoothness properties of the function in a random subspace spanned by the sampled coordinates. ESO inequalities were previously established for specia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
90
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
7

Relationship

4
3

Authors

Journals

citations
Cited by 42 publications
(90 citation statements)
references
References 30 publications
0
90
0
Order By: Relevance
“…Mini-batch methods (which instead of just one data-example use updates from several examples per iteration) are more flexible and lie within these two communication vs. computation extremes. However, mini-batch versions of both SGD and coordinate descent (CD) [13,14,37,[46][47][48][52][53][54]61,69,74] suffer from their convergence rate degrading towards the rate of batch gradient descent as the size of the mini-batch is increased. This follows because mini-batch updates are made based on the outdated previous parameter vector w, in contrast to methods that allow immediate local updates like CoCoA.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…Mini-batch methods (which instead of just one data-example use updates from several examples per iteration) are more flexible and lie within these two communication vs. computation extremes. However, mini-batch versions of both SGD and coordinate descent (CD) [13,14,37,[46][47][48][52][53][54]61,69,74] suffer from their convergence rate degrading towards the rate of batch gradient descent as the size of the mini-batch is increased. This follows because mini-batch updates are made based on the outdated previous parameter vector w, in contrast to methods that allow immediate local updates like CoCoA.…”
Section: Discussion and Related Workmentioning
confidence: 99%
“…In the time between the first online appearance of this work on arXiv (October 2013; arXiv:1310.3438), and the time this paper went to press, this work led to a number of extensions [3,7,[16][17][18]. All of these papers share the defining feature of NSync, namely, its ability to work with an arbitrary probability law defining the selection of the active coordinates in each iteration.…”
Section: Literaturementioning
confidence: 99%
“…Motivated by the introduction of the nonuniform ESO assumption in this paper, and the development in Sect. 3 of our work, an entire paper was recently written, dedicated to the study of nonuniform ESO inequalities [16]. 1 We now turn to the second and final assumption.…”
Section: Assumptionsmentioning
confidence: 99%
See 2 more Smart Citations