Adam Noack scite author profile

Adam Noack

4Publications

17Citation Statements Received

46Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Oregon

Publications

Order By: Most citations

An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness

et al. 2021

View full text Add to dashboard Cite

Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical evidence suggests that these two seemingly disparate issues are actually connected. In particular, robust models tend to provide more interpretable gradients than non-robust models. However, whether this relationship works in the opposite direction remains obscure. With this paper, we seek empirical answers to the following question: can models acquire adversarial robustness when they are trained to have interpretable gradients? We introduce a theoretically inspired technique called Interpretation Regularization (IR), which encourages a model's gradients to (1) match the direction of interpretable target salience maps and (2) have small magnitude. To assess model performance and tease apart factors that contribute to adversarial robustness, we conduct extensive experiments on MNIST and CIFAR-10 with both 2 and ∞ attacks. We demonstrate that training the networks to have interpretable gradients improves their robustness to adversarial perturbations. Applying the network interpretation technique SmoothGrad [59] yields additional performance gains, especially in cross-norm attacks and under heavy perturbations. The results indicate that the interpretability of the model gradients is a crucial factor for adversarial robustness. Code for the experiments can be found at https ://githu b.com/a1noa ck/inter p_regul ariza tion.

show abstract

What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations

Xie¹,

Brophy²,

Noack³

et al. 2021

View full text Add to dashboard Cite

Adversarial attacks curated against NLP models are increasingly becoming practical threats. Although various methods have been developed to detect adversarial attacks, securing learning-based NLP systems in practice would require more than identifying and evading perturbed instances. To address these issues, we propose a new set of adversary identification tasks, Attacker Attribute Classification via Textual Analysis (AACTA), that attempts to obtain more detailed information about the attackers from adversarial texts. Specifically, given a piece of adversarial text, we hope to accomplish tasks such as localizing perturbed tokens, identifying the attacker's access level to the target model, determining the evasion mechanism imposed, and specifying the perturbation type employed by the attacking algorithm. Our contributions are as follows: we formalize the task of classifying attacker attributes, and create a benchmark on various target models from sentiment classification and abuse detection domains. We show that signals from BERT models and target models can be used to train classifiers that reveal the properties of the attacking algorithms. We demonstrate that adversarial attacks leave interpretable traces in the feature space of both of pre-trained language models and target models, making AACTA a promising direction towards more trustworthy NLP systems.

show abstract

Identifying Adversarial Attacks on Text Classifiers

Xie¹,

Brophy²,

Noack³

et al. 2022

Preprint

View full text Add to dashboard Cite

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness

Noack

Ahern

Dou

et al. 2019

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Adam Noack

An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness

What Models Know About Their Attackers: Deriving Attacker Information From Latent Representations

Identifying Adversarial Attacks on Text Classifiers

An Empirical Study on the Relation between Network Interpretability and Adversarial Robustness

Contact Info

Product

Resources

About