2021
DOI: 10.1007/s42979-020-00390-x
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study on the Relation Between Network Interpretability and Adversarial Robustness

Abstract: Deep neural networks (DNNs) have had many successes, but they suffer from two major issues: (1) a vulnerability to adversarial examples and (2) a tendency to elude human interpretation. Interestingly, recent empirical and theoretical evidence suggests that these two seemingly disparate issues are actually connected. In particular, robust models tend to provide more interpretable gradients than non-robust models. However, whether this relationship works in the opposite direction remains obscure. With this paper… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(15 citation statements)
references
References 25 publications
0
15
0
Order By: Relevance
“…Zhou et al (2020) propose to evaluate attribution methods through dataset modification. Noack et al (2021) show that image recognition models can achieve better adversarial robustness when they are trained to have interpretable gradients. To the best of our knowledge, we are the first to quantify the performance of rationale models under textual adversarial attacks and understand whether rationalization can inherently provide robustness.…”
Section: Related Workmentioning
confidence: 99%
“…Zhou et al (2020) propose to evaluate attribution methods through dataset modification. Noack et al (2021) show that image recognition models can achieve better adversarial robustness when they are trained to have interpretable gradients. To the best of our knowledge, we are the first to quantify the performance of rationale models under textual adversarial attacks and understand whether rationalization can inherently provide robustness.…”
Section: Related Workmentioning
confidence: 99%
“…Explainability of robust models Robust models were reported to have more interpretable gradient images [5,35,37,44,60] than those of vanilla CNNs. However, it is not yet known whether this superiority in interpretability remains when state-of-the-art AM methods are used.…”
Section: Related Workmentioning
confidence: 99%
“…Zhou et al (2020) propose to evaluate attribution methods through dataset modification. Noack et al (2021) show that image recognition models can achieve better adversarial robustness when they are trained to have interpretable gradients. To the best of our knowledge, we are the first to quantify the performance of rationale models under textual adversarial attacks and understand whether rationalization can inherently provide robustness.…”
Section: Related Workmentioning
confidence: 99%