A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

Liu, Shuo

doi:10.48550/arxiv.2106.08545

Cited by 2 publications

(2 citation statements)

References 87 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is well known that designing robust algorithms to adversarial attacks is nontrivial and challenging for multi-agent consensus, distributed learning and decentralized optimization, etc. (e.g., [92]- [94]). In this respect, adversarial agents were considered recently for DOL in [95], where Byzantine faulty agents can update its variable arbitrarily, which is then transmitted to its neighbors, with the purpose of preventing no-faulty agents from achieving the optimal solution, and individual static regret is ensured by establishing sufficient conditions on the graph topology, the number and location of the adversarial agents.…”

Section: Communication Perspectivementioning

confidence: 99%

A Survey on Distributed Online Optimization and Game

Li¹,

Xie²,

Li³

2022

Preprint

View full text Add to dashboard Cite

Decentralized online learning (DOL) has been increasingly researched in the last decade, mostly motivated by its wide applications in sensor networks, commercial buildings, robotics (e.g., decentralized target tracking and formation control), smart grids, deep learning, and so forth. In this problem, there are a network of agents who may be cooperative (i.e., decentralized online optimization) or noncooperative (i.e., online game) through local information exchanges, and the local cost function of each agent is often time-varying in dynamic and even adversarial environments. At each time, a decision must be made by each agent based on historical information at hand without knowing future information on cost functions. Although this problem has been extensively studied in the last decade, a comprehensive survey is lacking. Therefore, this paper provides a thorough overview of DOL from the perspective of problem settings, communication, computation, and performances. In addition, some potential future directions are also discussed in details.

show abstract

Section: Communication Perspectivementioning

confidence: 99%

A Survey on Distributed Online Optimization and Game

Li¹,

Xie²,

Li³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…where H is a set of non-faulty agents. Various methods are proposed to solve Byzantine fault-tolerant optimization or learning problems [34], including robust gradient aggregation [5,11], gradient coding [10], and other methods [52,55].…”

Section: Related Workmentioning

confidence: 99%

Utilizing Redundancy in Cost Functions for Resilience in Distributed Optimization and Learning

Liu¹,

Gupta²,

Vaidya³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

This paper considers the problem of resilient distributed optimization and stochastic machine learning in a server-based architecture. The system comprises a server and multiple agents, where each agent has a local cost function. The agents collaborate with the server to find a minimum of their aggregate cost functions. We consider the case when some of the agents may be asynchronous and/or Byzantine faulty. In this case, the classical algorithm of distributed gradient descent (DGD) is rendered ineffective. Our goal is to design techniques improving the efficacy of DGD with asynchrony and Byzantine failures.To do so, we start by proposing a way to model the agents' cost functions by the generic notion of (f, r; )-redundancy where f and r are the parameters of Byzantine failures and asynchrony, respectively, and characterizes the closeness between agents' cost functions. This allows us to quantify the level of redundancy present amongst the agents' cost functions, for any given distributed optimization problem. We demonstrate, both theoretically and empirically, the merits of our proposed redundancy model in improving the robustness of DGD against asynchronous and Byzantine agents, and their extensions to distributed stochastic gradient descent (D-SGD) for robust distributed machine learning with asynchronous and Byzantine agents. * This report supersedes our previous report [36] as it contains the most of the results in it.

show abstract

A Survey on Fault-tolerance in Distributed Optimization and Machine Learning

Cited by 2 publications

References 87 publications

A Survey on Distributed Online Optimization and Game

A Survey on Distributed Online Optimization and Game

Utilizing Redundancy in Cost Functions for Resilience in Distributed Optimization and Learning

Contact Info

Product

Resources

About