Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

Sharma, P.; Panda, Rohan; Joshi, Gauri; Varshney, Pramod K.

doi:10.48550/arxiv.2203.04850

Cited by 4 publications

(6 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Otherwise, at a cost of increased gradient complexity, each device can query the oracle O(1/ 2 ) times every round, average the results and make the stochastic gradient variance O(1/ 2 ). This procedure make the bound vanishing and leads to a gradient complexity matching the one of [43] given for the federated learning scenario. Fig.…”

Section: B Non-convex Loss Functionmentioning

confidence: 98%

Communication-Efficient Distributionally Robust Decentralized Learning

Zecchin¹,

Kountouris²,

Gesbert³

2022

Preprint

View full text Add to dashboard Cite

Decentralized learning algorithms empower interconnected edge devices to share data and computational resources to collaboratively train a machine learning model without the aid of a central coordinator (e.g. an orchestrating basestation).In the case of heterogeneous data distributions at the network devices, collaboration can yield predictors with unsatisfactory performance for a subset of the devices. For this reason, in this work we consider the formulation of a distributionally robust decentralized learning task and we propose a decentralized single loop gradient descent/ascent algorithm (AD-GDA) to solve the underlying minimax optimization problem. We render our algorithm communication efficient by employing a compressed consensus scheme and we provide convergence guarantees for smooth convex and non-convex loss functions. Finally, we corroborate the theoretical findings with empirical evidence of the ability of the proposed algorithm in providing unbiased predictors over a network of collaborating devices with highly heterogeneous data distributions.

show abstract

Section: B Non-convex Loss Functionmentioning

confidence: 98%

Communication-Efficient Distributionally Robust Decentralized Learning

Zecchin¹,

Kountouris²,

Gesbert³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The authors also prove sub-linear convergence for Local SGDA under diminishing stepsizes. Their convergence guarantees is then improved by [26] to match the results of centralized SGDA [15]. However, we note that all these algorithms require diminishing learning rates to obtain exact solutions, which suffer from relatively slow convergence speed, but our algorithm allows constant stepsizes and hence linear convergence can be achieved.…”

Section: Related Workmentioning

confidence: 99%

“…By solving min x∈R d max y ≤1 1 m m i=1 f i (x, y), we obtain a global robust model of the linear regression problem even under the worst contamination of gross noise. To measure the convergence of algorithms, we use the robust loss, i.e., given a model x, the corresponding robust loss [25,26] is defined by f (x) = max y ≤1 m i=1 f i (x, y). We generate local models and data as follows: the local model x * i is generated by a multivariate normal distribution.…”

Section: Robust Linear Regressionmentioning

confidence: 99%

“…In Local SGDA, each agent (or client) performs multiple steps of stochastic gradient descent ascent before communication to the server, which then aggregates local models by averaging. Under careful selection of diminishing learning rates, [25,26] show that Local SGDA converges to the global optimal solution sub-linearly. However, as we show here, when we try to improve the speed of convergence and reduce communication overhead by introducing a constant stepsize to Local SGDA, it fails to converge to the exact optimal solution, even when full gradients are used.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Sun¹,

Wei²

2022

Preprint

View full text Add to dashboard Cite

In this paper, we study a large-scale multi-agent minimax optimization problem, which models many interesting applications in statistical learning and game theory, including Generative Adversarial Networks (GANs). The overall objective is a sum of agents' private local objective functions. We first analyze an important special case, empirical minimax problem, where the overall objective approximates a true population minimax risk by statistical samples. We provide generalization bounds for learning with this objective through Rademacher complexity analysis. Then, we focus on the federated setting, where agents can perform local computation and communicate with a central server. Most existing federated minimax algorithms either require communication per iteration or lack performance guarantees with the exception of Local Stochastic Gradient Descent Ascent (SGDA), a multiple-local-update descent ascent algorithm which guarantees convergence under a diminishing stepsize. By analyzing Local SGDA under the ideal condition of no gradient noise, we show that generally it cannot guarantee exact convergence with constant stepsizes and thus suffers from slow rates of convergence. To tackle this issue, we propose FedGDA-GT, an improved Federated (Fed) Gradient Descent Ascent (GDA) method based on Gradient Tracking (GT). When local objectives are Lipschitz smooth and strongly-convex-stronglyconcave, we prove that FedGDA-GT converges linearly with a constant stepsize to global ǫ-approximation solution with O(log(1/ǫ)) rounds of communication, which matches the time complexity of centralized GDA method. Finally, we numerically show that FedGDA-GT outperforms Local SGDA.Preprint. Under review.

show abstract

“…[2020], Lu et al [2020], Yan et al [2020], Guo et al [2021], Sharma et al [2022]. Among them, [Zhang et al, 2021b] achieved the optimal complexity O( √ κǫ −2 ) in the deterministic case by introducing the Catalyst acceleration scheme [Lin et al, 2015, Paquette et al, 2018 into minimax problems, and Luo et al [2020], Zhang et al [2021b] achieved the best complexity in the finite-sum case for now, which are O( √ nκ 2 ǫ −2 ) and O(n 3/4 √ κǫ −2 ), respectively.…”

Section: Literature Reviewmentioning

confidence: 99%

Generalization Bounds of Nonconvex-(Strongly)-Concave Stochastic Minimax Optimization

Zhang¹,

Hu²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper studies the uniform convergence and generalization bounds for nonconvex-(strongly)-concave (NC-SC / NC-C) stochastic minimax optimization. We first establish the uniform convergence between the empirical minimax problem and the population minimax problem and show the Õ dκ 2 ǫ −2 and Õ dǫ −4 sample complexities respectively for the NC-SC and NC-C settings, where d is the dimension number and κ is the condition number. To the best of our knowledge, this is the first uniform convergence measured by the first-order stationarity in stochastic minimax optimization. Based on the uniform convergence, we shed light on the sample and gradient complexities required for finding an approximate stationary point for stochastic minimax optimization in the NC-SC and NC-C settings.

show abstract

Federated Minimax Optimization: Improved Convergence Analyses and Algorithms

Cited by 4 publications

References 15 publications

Communication-Efficient Distributionally Robust Decentralized Learning

Communication-Efficient Distributionally Robust Decentralized Learning

A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning

Generalization Bounds of Nonconvex-(Strongly)-Concave Stochastic Minimax Optimization

Contact Info

Product

Resources

About