2019
DOI: 10.48550/arxiv.1911.02508
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(19 citation statements)
references
References 0 publications
0
19
0
Order By: Relevance
“…LIME, and to a lesser extent SHAP, have been demonstrated to provide unreliable interpretations in some cases. For instance, LIME is strongly influenced by the chosen kernel width parameter (Slack et al 2019). In Section 6, we compare our new class of fMEs to LIME.…”
Section: Interpretable Machine Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…LIME, and to a lesser extent SHAP, have been demonstrated to provide unreliable interpretations in some cases. For instance, LIME is strongly influenced by the chosen kernel width parameter (Slack et al 2019). In Section 6, we compare our new class of fMEs to LIME.…”
Section: Interpretable Machine Learningmentioning
confidence: 99%
“…In addition to the sensitivity of results regarding parameter choices (Slack et al 2019), LIME is notoriously unstable even with fixed parameters. Zhou et al (2021) note that repeated runs using the same explanation algorithm on the same model for the same observation results in different model explanations, and they suggest significance testing as a remedy.…”
Section: Interpretation and Confidence Intervalsmentioning
confidence: 99%
“…One might intuit that a post-hoc explanation would never lead to a worse decision than one made using the same underlying model absent explanation. Recent research however has shown that not only are XAI methods innocuously fragile in practice [43,47], they are also susceptible to adversarial intervention [1,46,71]. In additional to these algorithmic issues, irreducible cognitive factors and intrinsic human biases [23,31,32] can perpetuate harmful effects in any algorithmically aided decision making context (explanations or not).…”
Section: Axiomatic Assumptionsmentioning
confidence: 99%
“…These methods arose in computer vision and have demonstrated empirical utility in producing nonlinear factor models where the factors are conceptually sensible. Yet, due to the black-box nature of deep learning, explanations for how the factors are generated from the data, using local saliency maps for instance, are unreliable or imprecise (Laugel et al, 2019;Slack et al, 2020;Arun et al, 2020). In imaging applications, where the features are raw pixels, this type of interpretability is unnecessary.…”
Section: Disentangled Autoencodersmentioning
confidence: 99%