2022
DOI: 10.48550/arxiv.2205.13042
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

How explainable are adversarially-robust CNNs?

Abstract: Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods general… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 46 publications
0
3
0
Order By: Relevance
“…Unlike triplet and contrastive losses, softmax-based losses are not subject to explicit easy/hard sample mining [42,31]. In this section, using a simple toy example, we show that the Softmax-based losses implicitly benefit easy/hard sample mining by their gradient.…”
Section: Complement To Angular-margin Gradientmentioning
confidence: 92%
“…Unlike triplet and contrastive losses, softmax-based losses are not subject to explicit easy/hard sample mining [42,31]. In this section, using a simple toy example, we show that the Softmax-based losses implicitly benefit easy/hard sample mining by their gradient.…”
Section: Complement To Angular-margin Gradientmentioning
confidence: 92%
“…In other words, these differently trained model guard the sample image against the attributional top-k attack. Recent work by Nourelahi et al (2022) has empirically studied the effectiveness of adversarially (PGD) trained models in obtaining better attributions, e.g., Figure 7(center) shows sharper attributions to features highlighting the ground-truth class.…”
Section: A Stronger Model For Attributional Robustnessmentioning
confidence: 99%
“…The key concerns related to attributions that we discussed in earlier sections are that: (1) the attribution maps have to be of good quality, and (2) they should be robust under perturbations. It is observed that adversarially robust models obtain the best explanation maps under different attribution methods (Figure 10 from Nourelahi et al [82]) as they seem to be more focused on the objects related to the input's ground-truth class. This section explores the interplay between explanations and adversarial robustness of neural network models, primarily from an image input perspective where adversarial robustness is more studied.…”
Section: Connecting Explanations To Adversarial Robustnessmentioning
confidence: 99%