Towards Robust Interpretability with Self-Explaining Neural Networks

Alvarez-Melis, David; Jaakkola, Tommi S.

doi:10.48550/arxiv.1806.07538

Cited by 26 publications

(43 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Though [7] discusses several advantages of gradient-based methods over rationalization, they are post-hoc and cannot impose structural constraints on the explanation. Other lines of work that provide post-hoc explanations include local perturbations [25,30]; locally fitting interpretable models [1,36]; and generating explanations in the form of edits to inputs that change model prediction to the contrast case [37].…”

Section: Model Interpretability Beyond Selective Rationalizationmentioning

confidence: 99%

“…where (ii) results from the definition of θ * r in Equation ( 6); (i) is due to the linearity of Lr with respect to β. More specifically Lr βπ (1) + (1 − β)π (2) , θ * r (βπ (1)…”

Section: A1 Proof To Theoremmentioning

confidence: 99%

“…i (X) (Y, fr(ei X; θ * r (βπ (1) + (1 − β)π (2) ))) =E X,Y βE M ∼π (1) (X) (Y, fr(M X; θ * r (βπ (1) + (1 − β)π (2) )))…”

Section: A1 Proof To Theoremmentioning

confidence: 99%

“…Proof. ∀α (1) = α (2) , β ∈ [0, 1], our goal is to show that L * a (βα (1) + (1 − β)α (2) ) ≤ βL * a (α (1) ) + (1 − β)La(α (2) ). (16) This follows from the following derivations.…”

Section: A2 Proof To Theoremmentioning

confidence: 99%

“…First, we would like to show that α (1) (X) X is a deterministic function of α (X) X, which means that any instances x (1) and x (2) that make α (x (1) ) x (1) = α (x (2) ) x (2) would also [pos] and [neg] stand for the special biased symbols we appended with high correlations to the positive and negative classes.…”

Section: A3 Convexity Of Attention-based Explanation: a Special Casementioning

confidence: 99%

See 4 more Smart Citations

Understanding Interlocking Dynamics of Cooperative Rationalization

Yu¹,

Zhang²,

Chang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output. The selection mechanism is commonly integrated into the model itself by specifying a two-component cascaded system consisting of a rationale generator, which makes a binary selection of the input features (which is the rationale), and a predictor, which predicts the output based only on the selected features. The components are trained jointly to optimize prediction performance. In this paper, we reveal a major problem with such cooperative rationalization paradigm -model interlocking. Interlocking arises when the predictor overfits to the features selected by the generator thus reinforcing the generator's selection even if the selected rationales are sub-optimal. The fundamental cause of the interlocking problem is that the rationalization objective to be minimized is concave with respect to the generator's selection policy. We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection. The generator now realizes both soft and hard attention over the features and these are fed into the two different predictors. While the generator still seeks to support the original predictor performance, it also minimizes a gap between the two predictors. As we will show theoretically, since the attention-based predictor exhibits a better convexity property, A2R can overcome the concavity barrier. Our experiments on two synthetic benchmarks and two real datasets demonstrate that A2R can significantly alleviate the interlock problem and find explanations that better align with human judgments. 2 * Authors contributed equally to this paper. Work was done when SC was at MIT-IBM Watson AI Lab. 2 We release our code at https://github.com/Gorov/Understanding_Interlocking.Preprint. Under review.

show abstract

Section: Model Interpretability Beyond Selective Rationalizationmentioning

confidence: 99%

Section: A1 Proof To Theoremmentioning

confidence: 99%

“…i (X) (Y, fr(ei X; θ * r (βπ (1) + (1 − β)π (2) ))) =E X,Y βE M ∼π (1) (X) (Y, fr(M X; θ * r (βπ (1) + (1 − β)π (2) )))…”

Section: A1 Proof To Theoremmentioning

confidence: 99%

Section: A2 Proof To Theoremmentioning

confidence: 99%

Section: A3 Convexity Of Attention-based Explanation: a Special Casementioning

confidence: 99%

See 3 more Smart Citations

Understanding Interlocking Dynamics of Cooperative Rationalization

Yu¹,

Zhang²,

Chang³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Evaluating perceptual and semantic interpretability of saliency methods: A case study of melanoma

Bokadia

Yang

et al. 2022

Applied AI Letters

View full text Add to dashboard Cite

In order to be useful, XAI explanations have to be faithful to the AI system they seek to elucidate and also interpretable to the people that engage with them. There exist multiple algorithmic methods for assessing faithfulness, but this is not so for interpretability, which is typically only assessed through expensive user studies. Here we propose two complementary metrics to algorithmically evaluate the interpretability of saliency map explanations. One metric assesses perceptual interpretability by quantifying the visual coherence of the saliency map. The second metric assesses semantic interpretability by capturing the degree of overlap between the saliency map and textbook features-features human experts use to make a classification. We use a melanoma dataset and a deep-neural network classifier as a case-study to explore how our two interpretability metrics relate to each other and a faithfulness metric. Across six commonly used saliency methods, we find that none achieves high scores across all three metrics for all test images, but that different methods perform well in different regions of the data distribution. This variation between methods can be leveraged to consistently achieve high interpretability and faithfulness by using our metrics to inform saliency mask selection on a case-by-case basis. Our interpretability metrics provide a new way to evaluate saliency-based explanations and allow for the adaptive combination of saliency-based explanation methods.

show abstract

Trustworthy Explainability Acceptance: A New Metric to Measure the Trustworthiness of Interpretable AI Medical Diagnostic Systems

Kaur

Uslu

Durresi

et al. 2021

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

We propose, Trustworthy Explainability Acceptance metric to evaluate explainable AI systems using expert-in-the-loop. Our metric calculates acceptance by quantifying the distance between the explanations generated by the AI system and the reasoning provided by the experts based on their expertise and experience. Our metric also evaluates the trust of the experts to include different groups of experts using our trust mechanism. Our metric can be easily adapted to any Interpretable AI system and be used in the standardization process of trustworthy AI systems. We illustrate the proposed metric using the high-stake medical AI application of Predicting Ductal Carcinoma in Situ (DCIS) Recurrence. Our metric successfully captures the explainability of AI systems in DCIS recurrence by experts.

show abstract

Towards Robust Interpretability with Self-Explaining Neural Networks

Cited by 26 publications

References 0 publications

Understanding Interlocking Dynamics of Cooperative Rationalization

Understanding Interlocking Dynamics of Cooperative Rationalization

Evaluating perceptual and semantic interpretability of saliency methods: A case study of melanoma

Trustworthy Explainability Acceptance: A New Metric to Measure the Trustworthiness of Interpretable AI Medical Diagnostic Systems

Contact Info

Product

Resources

About