2021
DOI: 10.1109/tsp.2021.3099977
|View full text |Cite
|
Sign up to set email alerts
|

Communication-Adaptive Stochastic Gradient Methods for Distributed Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 15 publications
(16 citation statements)
references
References 14 publications
1
15
0
Order By: Relevance
“…As shown in Corollary 1, the overall sample complexity (i.e., the total number of data samples required to achieve an -accurate stationary point) of our FBO-AggITD is O(κ 9 −2 ), which matches the sample complexities of stocBiO (Ji et al, 2021), BSA (Ghadimi & Wang, 2018) and ALSET (Chen et al, 2021a) in the non-federated bilevel optimization and FedNest (Tarzanagh et al, 2022) in the federated setting despite the data heterogeneity. Compared to FedNest, our method is much simpler with lower communication costs.…”
Section: Convergence and Complexity Analysissupporting
confidence: 66%
See 3 more Smart Citations
“…As shown in Corollary 1, the overall sample complexity (i.e., the total number of data samples required to achieve an -accurate stationary point) of our FBO-AggITD is O(κ 9 −2 ), which matches the sample complexities of stocBiO (Ji et al, 2021), BSA (Ghadimi & Wang, 2018) and ALSET (Chen et al, 2021a) in the non-federated bilevel optimization and FedNest (Tarzanagh et al, 2022) in the federated setting despite the data heterogeneity. Compared to FedNest, our method is much simpler with lower communication costs.…”
Section: Convergence and Complexity Analysissupporting
confidence: 66%
“…), as shown in the final convergence analysis. It is worth mentioning that these two terms match the error bound of the stochastic AID-based hypergradient estimator in non-federated setting (Ji et al, 2021;Ghadimi & Wang, 2018;Chen et al, 2021a), and hence our analysis can be of independent interest to non-federated bilevel optimization. Also note that the last two non-vanishing error terms O(λ 2 β 2 ) and O(λβ 2 ) are induced by the client drift in the y updates, which exists specially in the FL, can be addressed by choosing a sufficiently small stepsize β. Technically, we first shows via a recursive analysis that the key approximation error between the expected indirect part of the AggITD estimator…”
Section: Estimation Properties For Aggitdmentioning
confidence: 64%
See 2 more Smart Citations
“…Recently, several prevailing machine learning applications can be naturally formulated as a bilevel programming problem (Maclaurin et al, 2015;Pedregosa, 2016;Finn et al, 2017;Franceschi et al, 2017Franceschi et al, , 2018Ji et al, 2020), which brings a lot of attention to the bilevel programming in the machine learning community. On the theoretical side, there are many existing works deriving both asymptotic (Franceschi et al, 2018;Shaban et al, 2019;Liu et al, 2021) and non-asymptotic (Ghadimi & Wang, 2018;Hong et al, 2020;Chen et al, 2021a;Guo & Yang, 2021; convergence analysis for the determinstic or stochastic bilevel optimization. For example, Ghadimi & Wang 2018;Hong et al 2020;Arbel & Mairal 2022 proved the convergence for SGD type of bilevel methods via the approximate implicit differentiation (AID) approach.…”
Section: Related Workmentioning
confidence: 99%