In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two state Markov chain that indicates Byzantine or trustworthy behaviours at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) Temporal gradient averaging, 2) robust aggregation, and 3) gradient normalization. We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation, and prove that for strongly convex and smooth non-convex cost functions, our algorithm achieves order-optimal statistical error and convergence rates.
I. INTRODUCTIONConvenience for large-scale data processing, privacy preservation, and parallel algorithm execution rendered the design of distributed optimization algorithms an attractive field for scholars [1]- [7]. However, the distributed nature of such methods, for example, physically separated servers connected over a network, exposes the system to vulnerabilities not faced by their traditional centralized counterparts [8]. The robustness and security of distributed methods need to be taken into account when assessing algorithm performance [2].In a centralized system, data can be cleaned, faultless computation can be established by reliable hardware, and communication requirements are minimal. On the other hand, typical distributed algorithms assume trustworthy data, faultless computation, and reliable communication. Also, privacy constraints might not allow for data corruption checks, while distributed computing infrastructure increases the likelihood of faulty hardware, e.g., personal devices [9]. Lastly, unreliable communication might occur due to noisy wireless communication, or more importantly, due to man-in-the-middle adversarial attacks. In man-in-the-middle attacks, an adversary can take over network sub-systems and arbitrarily alter the information stored in and communicated between the machines to prevent convergence to the optimal solution, i.e., Byzantine attacks [10].Robust distributed optimization under adversarial manipulation has been studied for various corruption models, see [11],