2022
DOI: 10.48550/arxiv.2202.01838
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

Abstract: While SGD, which samples from the data with replacement is widely studied in theory, a variant called Random Reshuffling (RR) is more common in practice. RR iterates through random permutations of the dataset and has been shown to converge faster than SGD. When the order is chosen deterministically, a variant called incremental gradient descent (IG), the existing convergence bounds show improvement over SGD but are worse than RR. However, these bounds do not differentiate between a good and a bad ordering and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…HaoChen and Sra [2019], Gürbüzbalaban et al [2021], and Mishchenko et al [2020] discuss extensively on the conditions needed for RR to benefit. Some recent works [Lu et al, 2021a, Mohtashami et al, 2022 suggest constructing better data permutations than RR via a memory-intensive greedy strategy. Concretely, Mohtashami et al [2022] proposes evaluating gradients on all the examples first to minimize Equation (2) before starting an epoch, applied to Federated Learning; Lu et al [2021a] provides an alternative of minimizing Equation (2) using stale gradients from previous epoch to estimate the gradient on each example.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…HaoChen and Sra [2019], Gürbüzbalaban et al [2021], and Mishchenko et al [2020] discuss extensively on the conditions needed for RR to benefit. Some recent works [Lu et al, 2021a, Mohtashami et al, 2022 suggest constructing better data permutations than RR via a memory-intensive greedy strategy. Concretely, Mohtashami et al [2022] proposes evaluating gradients on all the examples first to minimize Equation (2) before starting an epoch, applied to Federated Learning; Lu et al [2021a] provides an alternative of minimizing Equation (2) using stale gradients from previous epoch to estimate the gradient on each example.…”
Section: Related Workmentioning
confidence: 99%
“…Some recent works [Lu et al, 2021a, Mohtashami et al, 2022 suggest constructing better data permutations than RR via a memory-intensive greedy strategy. Concretely, Mohtashami et al [2022] proposes evaluating gradients on all the examples first to minimize Equation (2) before starting an epoch, applied to Federated Learning; Lu et al [2021a] provides an alternative of minimizing Equation (2) using stale gradients from previous epoch to estimate the gradient on each example. Rajput et al [2021] introduces an interesting variant to RR by reversing the ordering every other epoch, achieving better rates for quadratics.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations