We report on Krum, the rst provably Byzantine-tolerant aggregation rule for distributed Stochastic Gradient Descent (SGD). Krum guarantees the convergence of SGD even in a distributed setting where (asymptotically) up to half of the workers can be malicious adversaries trying to attack the learning system.
The growth of data, the need for scalability and the complexity of models used in modern machine learning calls for distributed implementations. Yet, as of today, distributed machine learning frameworks have largely ignored the possibility of arbitrary (i.e., Byzantine) failures. In this paper, we study the robustness to Byzantine failures at the fundamental level of stochastic gradient descent (SGD), the heart of most machine learning algorithms. Assuming a set of n workers, up to f of them being Byzantine, we ask how robust can SGD be, without limiting the dimension, nor the size of the parameter space.We first show that no gradient descent update rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the update rule capturing the basic requirements to guarantee convergence despite f Byzantine workers. We finally propose Krum, an update rule that satisfies the resilience property aforementioned. For a d-dimensional learning problem, the time complexity of Krum is O(n 2 • (d + log n)).
International audienceA message adversary is a daemon that suppresses messages in round-based message-passing synchronous systems in which no process crashes. A property imposed on a message adversary defines a subset of messages that cannot be eliminated by the adversary. It has recently been shown that when a message adversary is constrained by a property denoted TOUR (for tournament), the corresponding synchronous system and the asynchronous crash-prone read/write system have the same computability power for task solvability. This paper introduces new message adversary properties (denoted SOURCE and QUORUM), and shows that the synchronous round-based systems whose adversaries are constrained by these properties are characterizations of classical asynchronous crash-prone systems (1) in which processes communicate through atomic read/write registers or point-to-point message-passing, and (2) enriched with failure detectors such as Ω and Σ. Hence these properties characterize maximal adversaries, in the sense that they define strongest message adversaries equating classical asynchronous crash-prone systems. They consequently provide strong relations linking round-based synchrony weakened by message adversaries with asynchrony restricted with failure detectors. This not only enriches our understanding of the synchrony/asynchrony duality, but also allows for the establishment of a meaningful hierarchy of property-constrained message adversaries.Cet article étudie les relations entre les modèles synchrones affaiblis par des adversaires supprimant des messages et les modèles asynchrones renforcés par des détecteurs de fautes
This paper continues our quest for the weakest failure detector which allows the k-set agreement problem to be solved in asynchronous message-passing systems prone to any number of process failures. It has two main contributions which (we hope) will be instrumental to complete this quest.The first contribution is a new failure detector (denoted ΠΣ x,y ). This failure detector has several noteworthy properties. (a) It is stronger than Σ x which has been shown to be necessary. (b) It is equivalent to the pair Σ, Ω when x = y = 1 (from which it follows that ΠΣ 1,1 is optimal to solve consensus). (c) It is equivalent to the pair Σ n−1 , Ω n−1 when x = y = n − 1 (from which it follows that ΠΣ n−1,n−1 is optimal for (n − 1)-set agreement). (d) It is strictly weaker than the pair Σ x , Ω y (which has been investigated in previous works) for the pairs (x, y) such that 1 < y < x < n. (e) It is operational: the paper presents a ΠΣ x,y -based algorithm that solves k-set agreement for k ≥ xy.The second contribution of the paper is a proof that, for 1 < k < n − 1, the eventual leaders failure detector Ω k (which eventually provides each process with the same set of k process identities, this set including at least one correct process) is not necessary to solve k-set agreement problem. More precisely, the paper shows that the weakest failure detector for k-set agreement and Ω k cannot be compared.Key-words: Asynchronous system, Distributed computing, Eventual leader, Failure detector, Fault-tolerance, Message-passing system, Quorum, Reduction, k-Set agreement, Wait-freedom.En quête du détecteur de fautes minimal pour le problème d'accord ensembliste Résumé : Ce rapport est une avancée dans la recherche du détecteur de fautes minimal permettant de résoudre le problème d'accord ensembliste dans un système asynchrone à communication par messages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.