Data-Dependent Convergence for Consensus Stochastic Optimization

Bijral, Avleen S.; Sarwate, Anand D.; Srebro, Nathan

doi:10.1109/tac.2017.2671377

Cited by 13 publications

(5 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each node only synchronizes with its neighbors, thus reducing the communication overhead significantly. Decentralized averaging has a long history in the distributed and consensus optimization community (Tsitsiklis et al, 1986;Nedic & Ozdaglar, 2009;Duchi et al, 2012;Tsianos et al, 2012;Zeng & Yin, 2016;Yuan et al, 2016;Sirb & Ye, 2018;Bijral et al, 2017). Most of these works are for gradient descent or dual averaging methods rather than stochastic gradient descent (SGD), and they do not allow workers to make local updates.…”

Section: Introductionmentioning

confidence: 99%

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Wang,

Joshi

2018

Preprint

103

186

View full text Add to dashboard Cite

Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a unified framework called Cooperative SGD that subsumes existing communication-efficient SGD algorithms such as periodic-averaging, elasticaveraging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover, this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence with low error floor.

show abstract

Section: Introductionmentioning

confidence: 99%

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Wang,

Joshi

2018

Preprint

103

186

View full text Add to dashboard Cite

show abstract

“…Many papers (e.g., [41], [42]) use the term in the SA sense described here, with a continuous stream of data in which no sample is used more than once. However, other papers (e.g., [2], [43]) use the term within the ERM framework to describe algorithms that operate on a fixed dataset, from which mini-batches of data are sampled with replacement and noisy gradients are computed. To disambiguate, some authors (e.g., [44]) use the term single-pass SGD to indicate the former usage.…”

Section: ) Stochastic Approximation (Sa)mentioning

confidence: 99%

Scaling-up Distributed Processing of Data Streams for Machine Learning

Nokleby,

Raja,

Bajwa

2020

Preprint

View full text Add to dashboard Cite

Emerging applications of machine learning in numerous areas-including online social networks, remote sensing, internet-of-things systems, smart grids, and more-involve continuous gathering of and learning from streams of data samples. Real-time incorporation of streaming data into the learned machine learning models is essential for improved inference in these applications. Further, these applications often involve data that are either inherently gathered at geographically distributed entities due to physical reasons-e.g., internet-of-things systems and smart grids-or that are intentionally distributed across multiple computing machines for memory, storage, computational, and/or privacy reasons. Training of machine learning models in this distributed, streaming setting requires solving stochastic optimization problems in a collaborative manner over communication links between the physical entities.When the streaming data rate is high compared to the processing capabilities of individual computing entities and/or the rate of the communications links, this poses a challenging question: how can one best leverage the incoming data for distributed training of machine learning models under constraints on computing capabilities and/or communications rate? A large body of research in distributed online optimization has emerged in recent decades to tackle this and related problems. This paper reviews recently developed methods that focus on large-scale distributed stochastic optimization in the compute-and bandwidth-limited regime, with an emphasis on convergence analysis that explicitly accounts for the mismatch between computation, communication and streaming rates, and that provides sufficient conditions for order-optimum convergence. In particular, it focuses on methods that solve: (i) distributed stochastic convex problems, and (ii) distributed principal component analysis, which is a nonconvex problem with geometric structure that permits global convergence. For such methods, the paper discusses recent advances in terms of distributed algorithmic designs when faced with high-rate streaming data. Further, it reviews theoretical guarantees underlying these methods, which show there exist regimes in which systems can learn from distributed processing of streaming data at order-optimal rates-nearly as fast as if all the data were processed at a single super-powerful machine.

show abstract

“…Notably, consensus dynamics provide a foundation for decentralized optimization algorithms, which strategically implement consensus using synchronous or asynchronous gossiping between nodes 26,27 . Such algorithms are often employed to take advantage of distributed computing infrastructure to more efficiently train machine learning models, such as support vector machines 28,29 and deep neural networks [30][31][32][33] . For such systems, each node trains a local model on local data, and at the same time, communication between nodes enables them to reach a consensus on what the model parameter should be.…”

Section: Introductionmentioning

confidence: 99%

Balanced Hodge Laplacians Optimize Consensus Dynamics over Simplicial Complexes

Ziegler,

Skardal,

Dutta

et al. 2021

Preprint

View full text Add to dashboard Cite

Despite the vast literature on network dynamics, we still lack basic insights into dynamics on higher-order structures (e.g., edges, triangles, and more generally, k-dimensional "simplices") and how they are influenced through higherorder interactions. A prime example lies in neuroscience where groups of neurons (not individual ones) may provide the building blocks for neurocomputation. Here, we study consensus dynamics on edges in simplicial complexes using a type of Laplacian matrix called a Hodge Laplacian, which we generalize to allow higher-and lower-order interactions to have different strengths. Using techniques from algebraic topology, we study how collective dynamics converge to a low-dimensional subspace that corresponds to the homology space of the simplicial complex. We use the Hodge decomposition to show that higher-and lower-order interactions can be optimally balanced to maximally accelerate convergence, and that this optimum coincides with a balancing of dynamics on the curl and gradient subspaces. We additionally explore the effects of network topology, finding that consensus over edges is accelerated when 2-simplices are well dispersed, as opposed to clustered together.

show abstract

Data-Dependent Convergence for Consensus Stochastic Optimization

Cited by 13 publications

References 17 publications

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Scaling-up Distributed Processing of Data Streams for Machine Learning

Balanced Hodge Laplacians Optimize Consensus Dynamics over Simplicial Complexes

Contact Info

Product

Resources

About