We consider the distributed statistical learning problem over decentralized systems that are prone to adversarial attacks. This setup arises in many practical applications, including Google's Federated Learning. Formally, we focus on a decentralized system that consists of a parameter server and m working machines; each working machine keeps N/m data samples, where N is the total number of samples. In each iteration, up to q of the m working machines suffer Byzantine faults -- a faulty machine in the given iteration behaves arbitrarily badly against the system and has complete knowledge of the system. Additionally, the sets of faulty machines may be different across iterations. Our goal is to design robust algorithms such that the system can learn the underlying true parameter, which is of dimension d, despite the interruption of the Byzantine attacks. In this paper, based on the geometric median of means of the gradients, we propose a simple variant of the classical gradient descent method. We show that our method can tolerate q Byzantine failures up to 2(1+ε)q łe m for an arbitrarily small but fixed constant ε>0. The parameter estimate converges in O(łog N) rounds with an estimation error on the order of max √dq/N, ~√d/N , which is larger than the minimax-optimal error rate √d/N in the centralized and failure-free setting by at most a factor of √q . The total computational complexity of our algorithm is of O((Nd/m) log N) at each working machine and O(md + kd log 3 N) at the central server, and the total communication cost is of O(m d log N). We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. To handle this issue in the analysis, we prove that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.
Given a set of pairwise comparisons, the classical ranking problem computes a single ranking that best represents the preferences of all users. In this paper, we study the problem of inferring individual preferences, arising in the context of making personalized recommendations. In particular, we assume that there are n users of r types; users of the same type provide similar pairwise comparisons for m items according to the BradleyTerry model. We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are r max{m, n} log m log 2 n pairwise comparisons per type, which is near optimal in sample complexity when r only grows logarithmically with m or n. Our algorithm has three steps: first, for each user, compute the net-win vector which is a projection of its m 2 -dimensional vector of pairwise comparisons onto an m-dimensional linear subspace; second, cluster the users based on the net-win vectors; third, estimate a single preference for each cluster separately. The net-win vectors are much less noisy than the high dimensional vectors of pairwise comparisons and clustering is more accurate after the projection as confirmed by numerical experiments. Moreover, we show that, when a cluster is only approximately correct, the maximum likelihood estimation for the Bradley-Terry model is still close to the true preference.
This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $1$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined D-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their D-hop neighborhoods. This approach significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of graphs. Under the Chung-Lu random graph model with n vertices, max degree Θ(√n), and the power-law exponent 2<β<3, we show that as soon as D> 4-β/3-β, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only Ω((log n)4-β) initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires n1/2+ε seeds (for any small constant ε>0). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm.
We consider unreliable distributed learning systems wherein the training data is kept confidential by external workers, and the learner has to interact closely with those workers to train a model. In particular, we assume that there exists a system adversary that can adaptively compromise some workers; the compromised workers deviate from their local designed specifications by sending out arbitrarily malicious messages. We assume in each communication round, up to q out of the m workers suffer Byzantine faults. Each worker keeps a local sample of size n and the total sample size is N=nm. We propose a secured variant of the gradient descent method that can tolerate up to a constant fraction of Byzantine workers, i.e., q/m = O(1). Moreover, we show the statistical estimation error of the iterates converges in O(log N) rounds to O(√/N + √/N ), where d is the model dimension. As long as q=O(d), our proposed algorithm achieves the optimal error rate O(√/N $. Our results are obtained under some technical assumptions. Specifically, we assume strongly-convex population risk. Nevertheless, the empirical risk (sample version) is allowed to be non-convex. The core of our method is to robustly aggregate the gradients computed by the workers based on the filtering procedure proposed by Steinhardt et al. On the technical front, deviating from the existing literature on robustly estimating a finite-dimensional mean vector, we establish a uniform concentration of the sample covariance matrix of gradients, and show that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function. To get a near-optimal uniform concentration bound, we develop a new matrix concentration inequality, which might be of independent interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.