Recently there has been a significant amount of research on developing consensus based algorithms for distributed optimization motivated by applications that vary from large scale machine learning to wireless sensor networks. This work describes and proves convergence of a new algorithm called Push-Sum Distributed Dual Averaging which combines a recent optimization algorithm [1] with a push-sum consensus protocol [2]. As we discuss, the use of push-sum has significant advantages. Restricting to doubly stochastic consensus protocols is not required and convergence to the true average consensus is guaranteed without knowing the stationary distribution of the update matrix in advance. Furthermore, the communication semantics of just summing the incoming information make this algorithm truly asynchronous and allow a clean analysis when varying intercommunication intervals and communication delays are modelled. We include experiments in simulation and on a small cluster to complement the theoretical analysis.
In this paper we extend and analyze the distributed dual averaging algorithm [1] to handle communication delays and general stochastic consensus protocols. Assuming each network link experiences some fixed bounded delay, we show that distributed dual averaging converges and the error decays at a rate O(T −0.5 ) where T is the number of iterations. This bound is an improvement over [1] by a logarithmic factor in T for networks of fixed size. Finally, we extend the algorithm to the case of using general non-averaging consensus protocols. We prove that the bias introduced in the optimization can be removed by a simple correction that depends on the stationary distribution of the consensus matrix. I. INTRODUCTIONIn this paper we extend and analyze the distributed dual averaging algorithm [1]. We employ the fixed delay model introduced in [2] and show that distributed dual averaging still converges in the presence of finite and fixed communication delays. In addition, using a different bounding technique than [1], for a fixed network size, we improve on the convergence rate in terms of number of iterations by removing a logarithmic factor. Finally, we analyze the case where a general (non-averaging) consensus protocol is used. We explain and illustrate in simulation how the use of nondoubly stochastic consensus matrices biases the optimization. The issue is not however essential and we prove that a simple correction removes the bias.Over the last few years, the dramatic increase in available data has made imperative the use of parallel and distributed algorithms for solving large scale optimization and machine learning problems (see for example [3], [4]). Among the numerous possible choices, fully distributed algorithms combining some version of local optimization with a distributed consensus protocol are an appealing option [1], [4]- [7]. With such an approach, all computing nodes have the same role in the optimization procedure, thereby eliminating single points of failure and increasing robustness. This is important in large scale systems where machines may fail during the computation. This approach also has increased flexibility at adding more computational resources. At the same time, these algorithms are simple to implement and avoid the bookkeeping needed for more intricate hierarchical algorithms.The main focus of this paper is the analysis and extension of the distributed dual averaging algorithm. For practical application, it is important to know how the algorithm behaves
A lot of effort has been invested into characterizing the convergence rates of gradient based algorithms for non-linear convex optimization. Recently, motivated by large datasets and problems in machine learning, the interest has shifted towards distributed optimization. In this work we present a distributed algorithm for strongly convex constrained optimization. Each node in a network of n computers converges to the optimum of a strongly convex, L-Lipchitz continuous, separable objective at a rate O log ( √ nT ) Twhere T is the number of iterations. This rate is achieved in the online setting where the data is revealed one at a time to the nodes, and in the batch setting where each node has access to its full local dataset from the start. The same convergence rate is achieved in expectation when the subgradients used at each node are corrupted with additive zero-mean noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.