Deepesh Data scite author profile

Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of Qsparse-local-SGD. We analyze convergence for Qsparse-local-SGD in the distributed setting for smooth non-convex and convex objective functions. We demonstrate that Qsparse-local-SGD converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use Qsparse-local-SGD to train ResNet-50 on ImageNet, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.

show abstract

A Field Guide to Federated Optimization

Wang¹,

Charles²,

Xu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving federated optimization problems, which emphasize communication efficiency, data heterogeneity, compatibility with privacy and system requirements, and other constraints that are not primary considerations in other problem settings. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms through concrete examples and practical implementation, with a focus on conducting effective simulations to infer real-world performance. The goal of this work is not to survey the current literature, but to inspire researchers and practitioners to design federated learning algorithms that can be used in various practical applications.

show abstract

Communication and Randomness Lower Bounds for Secure Computation

Data

Prabhakaran

2016

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

Abstract-In secure multiparty computation (MPC), mutually distrusting users collaborate to compute a function of their private data without revealing any additional information about their data to the other users. While it is known that information theoretically secure MPC is possible among n users having access to private randomness and are pairwise connected by secure, noiseless, and bidirectional links against the collusion of less than n/2 users (in the honest-but-curious model; the threshold is n/3 in the malicious model), relatively little is known about the communication and randomness complexity of secure computation, i.e., the amount of communication and randomness required to compute securely.In this work, we employ information theoretic techniques to obtain lower bounds on communication and randomness complexity of secure MPC. We restrict ourselves to a concrete interactive setting involving three users under which all functions are securely computable against corruption of individual users in the honest-but-curious model. We derive lower bounds for both the perfect security case (i.e., zero-error and no leakage of information) and asymptotic security (where the probability of error and information leakage vanish as block-length goes to ∞).Our techniques include the use of a data processing inequality for residual information (i.e., the gap between mutual information and Gács-Körner common information), a new information inequality for 3-user protocols, and the idea of distribution switching by which lower bounds computed under certain worstcase scenarios can be shown to apply for the general case.Our lower bounds are shown to be tight for various functions of interest. In particular, we show concrete functions which have "communication-ideal" protocols, i.e., which achieve the minimum communication simultaneously on all links in the network. Also, we obtain the first explicit example of a function that incurs a higher communication cost than the input length, in the secure computation model of Feige, Kilian, and Naor [1], who had shown that such functions exist. We also show that our communication bounds imply tight lower bounds on the amount of randomness required by MPC protocols for many interesting functions.

show abstract

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

Basu¹,

Data²,

Karakus³

et al. 2019

Preprint

View full text Add to dashboard Cite

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

Data

Diggavi

2021

View full text Add to dashboard Cite

We study stochastic gradient descent (SGD) with local iterations in the presence of malicious/Byzantine clients, motivated by the federated learning. The clients, instead of communicating with the central server in every iteration, maintain their local models, which they update by taking several SGD iterations based on their own datasets and then communicate the net update with the server, thereby achieving communication-efficiency. Furthermore, only a subset of clients communicate with the server, and this subset may be different at different synchronization times. The Byzantine clients may collaborate and send arbitrary vectors to the server to disrupt the learning process. To combat the adversary, we employ an efficient high-dimensional robust mean estimation algorithm at the server to filter-out corrupt vectors; and to analyze the outlier-filtering procedure, we develop a novel matrix concentration result that may be of independent interest.We provide convergence analyses for both strongly-convex and non-convex smooth objectives in the heterogeneous data setting, where different clients may have different local datasets, and we do not make any probabilistic assumptions on data generation. We believe that ours is the first Byzantine-resilient algorithm and analysis with local iterations. We derive our convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets); and we provide bounds on these quantities in the statistical heterogeneous data setting. We also extend our results to the case when clients compute full-batch gradients.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Deepesh Data

Qsparse-Local-SGD: Distributed SGD With Quantization, Sparsification, and Local Computations

A Field Guide to Federated Optimization

Communication and Randomness Lower Bounds for Secure Computation

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data

Contact Info

Product

Resources

About