2021
DOI: 10.48550/arxiv.2102.09718
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Permutation-Based SGD: Is Random Optimal?

Abstract: A recent line of ground-breaking results for permutation-based SGD has corroborated a widely observed phenomenon: random permutations offer faster convergence than with-replacement sampling. However, is random optimal? We show that this depends heavily on what functions we are optimizing, and the convergence gap between optimal and random permutations can vary from exponential to nonexistent. We first show that for 1-dimensional strongly convex functions, with smooth second derivatives, there exist optimal per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
6
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(7 citation statements)
references
References 9 publications
1
6
0
Order By: Relevance
“…While it is possible to sample this function uniformly at random, it has been observed [6,7,23] that traversing a (possibly random) permutation of the functions works better in practice. Recent theoretical works confirmed this observation, showing that using a random permutation instead of random sampling can lead to faster convergence [17,20,22,24,33].…”
Section: Introductionmentioning
confidence: 80%
See 2 more Smart Citations
“…While it is possible to sample this function uniformly at random, it has been observed [6,7,23] that traversing a (possibly random) permutation of the functions works better in practice. Recent theoretical works confirmed this observation, showing that using a random permutation instead of random sampling can lead to faster convergence [17,20,22,24,33].…”
Section: Introductionmentioning
confidence: 80%
“…In [24], a Ω( 1 N 2 T 2 ) lower bound is established for random-reshuffling. On the other hand, Rajput et al [22] shows that for 1-dimensional functions with smooth Hessian, a good order exists which yields exponential convergence, beating RR. However, their proof is non-constructive and does not provide a technique for obtaining this order.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Concretely, Mohtashami et al [2022] proposes evaluating gradients on all the examples first to minimize Equation (2) before starting an epoch, applied to Federated Learning; Lu et al [2021a] provides an alternative of minimizing Equation (2) using stale gradients from previous epoch to estimate the gradient on each example. Rajput et al [2021] introduces an interesting variant to RR by reversing the ordering every other epoch, achieving better rates for quadratics. Other approaches, such…”
Section: Related Workmentioning
confidence: 99%
“…RR allows the optimizer to converge faster empirically and enjoys a better convergence rate in theory [Mishchenko et al, 2020]. Despite RR's theoretical improvement, it has been proven that RR does not always guarantee a good ordering [Yun et al, 2021, De Sa, 2020a; in fact a random permutation is far from being optimal even when optimizing a simple quadratic objective [Rajput et al, 2021]. In light of this, a natural research question is:…”
Section: Introductionmentioning
confidence: 99%