2021
DOI: 10.1609/aaai.v35i11.17148
|View full text |Cite
|
Sign up to set email alerts
|

Right for Better Reasons: Training Differentiable Models by Constraining their Influence Functions

Abstract: Explaining black-box models such as deep neural networks is becoming increasingly important as it helps to boost trust and debugging. Popular forms of explanations map the features to a vector indicating their individual importance to a decision on the instance-level. They can then be used to prevent the model from learning the wrong bias in data possibly due to ambiguity. For instance, Ross et al.'s ``right for the right reasons'' propagates user explanations backwards to the network by formulating differenti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 14 publications
0
11
0
Order By: Relevance
“…Regularization techniques such as EXPO (Explanation-based Optimization) and RRR (Right for the Right Reasons) , are designed to enhance the black-box model interpretability . Although one can argue that “simplicity” of models is positively correlated with interpretability, this is based on how the interpretability is evaluated.…”
Section: Theorymentioning
confidence: 99%
“…Regularization techniques such as EXPO (Explanation-based Optimization) and RRR (Right for the Right Reasons) , are designed to enhance the black-box model interpretability . Although one can argue that “simplicity” of models is positively correlated with interpretability, this is based on how the interpretability is evaluated.…”
Section: Theorymentioning
confidence: 99%
“…Intrinsic interpretability can also be improved by regularizing the input gradients as they can identify which feature descriptors contributed towards a prediction. [48] Regularization techniques such as EXPO [49] and RRR [50] are designed to enhance the black-box model interpretability. Although one can argue that "simplicity" of models are positively correlated with interpretability, this is based on how the interpretability is evaluated.…”
Section: Self-explaining Modelsmentioning
confidence: 99%
“…This approach is summarised in Equations 2 and 3 using GradCAM explanations, where M n ∈ {0, 1} is the ground truth annotation and norm normalizes the Grad-CAM output, θ holds a model's parameters, with input X, labels y, predictions ŷ, and a parameter regularization term λ. Techniques such as Right for Right Reasons using Integrated Gradients (RRR-IG) [10], Right for the Right Reasons using GradCAM (RRR-GC) [11], and Right for Better Reasons (RBR) [15] modify a model through explanation and training losses. Explanation losses can be computed between a feature annotations ground truth dataset and model generated explanations as can be seen in Equation 2 [11].…”
Section: Model Trainingmentioning
confidence: 99%
“…Missing region and spurious region feedback are the two most commonly used types of user feedback in image-based XIL under the assumption of correct classification of instances. While techniques such as RRR-IG [10], RRR-GC [11] and RBR [15] use spurious region feedback to fine-tune a model to ignore spurious features, Human Importance-aware Network Tuning (HINT) trains a model to focus on valid image objects [13].…”
Section: Feedback Collectionmentioning
confidence: 99%