Shashank Rajput scite author profile

Shashank Rajput

5Publications

61Citation Statements Received

233Citation Statements Given

How they've been cited

How they cite others

109

226

Affiliations

Publications

Order By: Most citations

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Wang¹,

Sreenivasan²,

Rajput³

et al. 2020

Preprint

View full text Add to dashboard Cite

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first order oracles or polynomial time. We couple our theoretical results with a new family of backdoor attacks, which we refer to as edge-case backdoors. An edge-case backdoor forces a model to misclassify on seemingly easy inputs that are however unlikely to be part of the training, or test data, i.e., they live on the tail of the input distribution. We explain how these edge-case backdoors can lead to unsavory failures and may have serious repercussions on fairness, and exhibit that with careful tuning at the side of the adversary, one can insert them across a range of machine learning tasks (e.g., image classification, OCR, text prediction, sentiment analysis).

show abstract

Closing the convergence gap of SGD without replacement

Rajput¹,

Gupta²,

Papailiopoulos³

2020

Preprint

View full text Add to dashboard Cite

show abstract

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Rajput¹,

Wang²,

Charles³

et al. 2019

Preprint

View full text Add to dashboard Cite

To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation. DETOX operates in two steps, a filtering step that uses limited redundancy to significantly reduce the effect of Byzantine nodes, and a hierarchical aggregation step that can be used in tandem with any state-of-the-art robust aggregation method. We show theoretically that this leads to a substantial increase in robustness, and has a per iteration runtime that can be nearly linear in the number of compute nodes. We provide extensive experiments over real distributed setups across a variety of large-scale machine learning tasks, showing that DETOX leads to orders of magnitude accuracy and speedup improvements over many state-of-the-art Byzantine-resilient approaches.

show abstract

Convergence and Margin of Adversarial Training on Separable Data

Charles¹,

Rajput²,

Wright³

et al. 2019

Preprint

View full text Add to dashboard Cite

Adversarial training is a technique for training robust machine learning models. To encourage robustness, it iteratively computes adversarial examples for the model, and then re-trains on these examples via some update rule. This work analyzes the performance of adversarial training on linearly separable data, and provides bounds on the number of iterations required for large margin. We show that when the update rule is given by an arbitrary empirical risk minimizer, adversarial training may require exponentially many iterations to obtain large margin. However, if gradient or stochastic gradient update rules are used, only polynomially many iterations are required to find a large-margin separator. By contrast, without the use of adversarial examples, gradient methods may require exponentially many iterations to achieve large margin. Our results are derived by showing that adversarial training with gradient updates minimizes a robust version of the empirical risk at a O(ln(t) 2 /t) rate, despite non-smoothness. We corroborate our theory empirically. IntroductionMachine learning models trained through standard methods often lack robustness against adversarial examples. These are small perturbations of input examples, designed to "fool" the model into misclassifying the original input [1,2,3,4]. Unfortunately, even small perturbations can cause a large degradation in the test accuracy of popular machine learning models, including deep neural networks [4]. This lack of robustness has spurred a large body of work on designing attack methods for crafting effective adversarial examples [5,6,7,8,9,10] and defense mechanisms for training models that are more robust to norm bounded perturbations [10,11,12,13,14,15,16].Adversarial training is a family of optimization-based methods for defending against adversarial perturbations. These methods generally operate by computing adversarial examples, and retraining the model on these examples [2,11,16]. This two-step process is repeated iteratively. While adversarial training methods have achieved empirical success [11,16,17,18], there is currently little theoretical analysis of their convergence and capacity for guaranteeing robustness.A parallel line of research has investigated whether standard optimization methods, such as gradient descent (GD) and stochastic gradient descent (SGD), exhibit an implicit bias toward robust and generalizable models [19,20,21,22,23,24]. This line of work shows that GD and SGD both converge to the max-margin classifier of linearly separable data, provided that the loss function is chosen appropriately. Notably, the max-margin classifier is the most robust model against 2 bounded perturbations. Thus, gradient descent is indeed biased towards robustness in some settings.Preprint. Under review.

show abstract

Permutation-Based SGD: Is Random Optimal?

Rajput¹,

Lee²,

Papailiopoulos³

2021

Preprint

View full text Add to dashboard Cite

A recent line of ground-breaking results for permutation-based SGD has corroborated a widely observed phenomenon: random permutations offer faster convergence than with-replacement sampling. However, is random optimal? We show that this depends heavily on what functions we are optimizing, and the convergence gap between optimal and random permutations can vary from exponential to nonexistent. We first show that for 1-dimensional strongly convex functions, with smooth second derivatives, there exist optimal permutations that offer exponentially faster convergence compared to random. However, for general strongly convex functions, random permutations are optimal. Finally, we show that for quadratic, strongly-convex functions, there are easy-to-construct permutations that lead to accelerated convergence compared to random. Our results suggest that a general convergence characterization of optimal permutations cannot capture the nuances of individual function classes, and can mistakenly indicate that one cannot do much better than random.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shashank Rajput

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Closing the convergence gap of SGD without replacement

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Convergence and Margin of Adversarial Training on Separable Data

Permutation-Based SGD: Is Random Optimal?

Contact Info

Product

Resources

About