Today's massively-sized datasets have made it necessary to often perform computations on them in a distributed manner. In principle, a computational task is divided into subtasks which are distributed over a cluster operated by a taskmaster. One issue faced in practice is the delay incurred due to the presence of slow machines, known as stragglers. Several schemes, including those based on replication, have been proposed in the literature to mitigate the effects of stragglers and more recently, those inspired by coding theory have begun to gain traction. In this work, we consider a distributed gradient descent setting suitable for a wide class of machine learning problems. We adapt the framework of Tandon et al. [18] and present a deterministic scheme that, for a prescribed per-machine computational effort, recovers the gradient from the least number of machines f theoretically permissible, via an O(f 2 ) decoding algorithm. We also provide a theoretical delay model which can be used to minimize the expected waiting time per computation by optimally choosing the parameters of the scheme. Finally, we supplement our theoretical findings with numerical results that demonstrate the efficacy of the method and its advantages over competing schemes.2. We provide an efficient online decoder, with time complexity O(f 2 ) for recovering the gradient from any f machines, which is faster than the best known method [18], O(f 3 ).3. We analyze the total computation time, and provide a method for finding the optimal coding parameters. We consider heavy-tailed delays, which have been widely observed in CPU job runtimes in practice [11,7,8].The rest of the paper is organized as follow. In Section 2, we describe the problem setup and explain the design objectives in detail. Section 3, provides the construction of our coding scheme, using the idea of balanced Reed-Solomon codes. Our efficient online decoder is presented in Section 4. We then characterize the total computation time, and describe the optimal choice of coding parameters, in Section 5. Finally, we provide our numerical results in Section 6, and conclude in Section 7.
Solving a large-scale system of linear equations is a key step at the heart of many algorithms in machine learning, scientific computing, and beyond. When the problem dimension is large, computational and/or memory constraints make it desirable, or even necessary, to perform the task in a distributed fashion. In this paper, we consider a common scenario in which a taskmaster intends to solve a large-scale system of linear equations by distributing subsets of the equations among a number of computing machines/cores. We propose an accelerated distributed consensus algorithm, in which at each iteration every machine updates its solution by adding a scaled version of the projection of an error signal onto the nullspace of its system of equations, and where the taskmaster conducts an averaging over the solutions with momentum. The convergence behavior of the proposed algorithm is analyzed in detail and analytically shown to compare favorably with the convergence rate of alternative distributed methods, namely distributed gradient descent, distributed versions of Nesterov's accelerated gradient descent and heavy-ball method, the block Cimmino method, and ADMM. On randomly chosen linear systems, as well as on real-world data sets, the proposed method offers significant speed-up relative to all the aforementioned methods. Finally, our analysis suggests a novel variation of the distributed heavy-ball method, which employs a particular distributed preconditioning, and which achieves the same theoretical convergence rate as the proposed consensus-based method.in engineering and the sciences. In particular, we consider the setting in which a taskmaster intends to solve a large-scale system of equations with the help of a set of computing machines/cores (Figure 1). This problem can in general be cast as an optimization problem with a cost function that is separable in the data (but not in the variables) 1 . Hence, there are general approaches to construct distributed algorithms for this problem, such as distributed versions of gradient descent and its variants (e.g. Nesterov's accelerated gradient [15], heavy-ball method [16], etc.), where each machine computes the partial gradient corresponding to a term in the cost and the taskmaster then aggregates the partial gradients by summing them, as well as the so-called Alternating Direction Method of Multipliers (ADMM) and its variants [3]. Among others, some recent approaches for Distributed Gradient Descent (DGD) have been presented and analyzed in [23], [17] and [21], and also coding techniques for robust DGD in the presence of failures and straggler machines have been studied in [10,20]. ADMM has been widely used [7,5,22] for solving various convex optimization problems in a distributed way, and in particular for consensus optimization [13,18,12], which is the relevant one for the type of separation that we have here.
Abstract-Aggregators are playing an increasingly crucial role in the integration of renewable generation in power systems. However, the intermittent nature of renewable generation makes market interactions of aggregators difficult to monitor and regulate, raising concerns about potential market manipulation by aggregators. In this paper, we study this issue by quantifying the profit an aggregator can obtain through strategic curtailment of generation in an electricity market. We show that, while the problem of maximizing the benefit from curtailment is hard in general, efficient algorithms exist when the topology of the network is radial (acyclic). Further, we highlight that significant increases in profit are possible via strategic curtailment in practical settings.
Abstract-We study the SIRS (Susceptible-InfectedRecovered-Susceptible) spreading processes over complex networks, by considering its exact 3 n -state Markov chain model. The Markov chain model exhibits an interesting connection with its 2n-state nonlinear "mean-field" approximation and the latter's corresponding linear approximation. We show that under the specific threshold where the disease-free state is a globally stable fixed point of both the linear and nonlinear models, the exact underlying Markov chain has an O(log n) mixing time, which means the epidemic dies out quickly. In fact, the epidemic eradication condition coincides for all the three models. Furthermore, when the threshold condition is violated, which indicates that the linear model is not stable, we show that there exists a unique second fixed point for the nonlinear model, which corresponds to the endemic state. We also investigate the effect of adding immunization to the SIRS epidemics by introducing two different models, depending on the efficacy of the vaccine. Our results indicate that immunization improves the threshold of epidemic eradication. Furthermore, the common threshold for fast-mixing of the Markov chain and global stability of the disease-free fixed point improves by the same factor for the vaccination-dominant model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.