2018
DOI: 10.48550/arxiv.1810.06583
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Concise Explanations of Neural Networks using Adversarial Training

Prasad Chalasani,
Jiefeng Chen,
Amrita Roy Chowdhury
et al.

Abstract: We show new connections between adversarial learning and explainability. One form of explanation of the output of a neural network model in terms of its input features, is a vector of feature-attributions using the Integrated Gradient (IG) method. Two desirable characteristics of an attribution-based explanation are:(1) sparseness: the attributions of irrelevant or weakly relevant features should be negligible, thus resulting in concise explanations in terms of the significant features, and (2) stability: it s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…At their core, they rely on the importance distribution onto the entire set of features. For example, Chalasani et al [178] use the Gini-index of the feature attribution vector as a measurement of its sparseness. The sparser the vector, the lower is the complexity of the explanation, since fewer features are required to explain the model behavior.…”
Section: Computational Evaluationmentioning
confidence: 99%
“…At their core, they rely on the importance distribution onto the entire set of features. For example, Chalasani et al [178] use the Gini-index of the feature attribution vector as a measurement of its sparseness. The sparser the vector, the lower is the complexity of the explanation, since fewer features are required to explain the model behavior.…”
Section: Computational Evaluationmentioning
confidence: 99%