ERASER: A Benchmark to Evaluate Rationalized NLP Models

DeYoung, Jay; Jain, Sarthak; Rajani, Nazneen Fatema; Lehman, Eric; Xiong, Caiming; Socher, Richard; Wallace, Byron C.

doi:10.18653/v1/2020.acl-main.408

Cited by 319 publications

(425 citation statements)

References 65 publications

Supporting

Mentioning

421

Contrasting

Order By: Relevance

“…Relation to Automatic Tests. Prior works have proposed automatic metrics for feature importance estimates (Nguyen, 2018;Hooker et al, 2019;DeYoung et al, 2020). Typically these operate by checking that model behavior follows reasonable patterns on counterfactual inputs constructed using the explanation, e.g., by masking "important" features and checking that a class score drops.…”

Section: Evaluating Interpretabilitymentioning

confidence: 99%

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Hase¹,

Bansal²

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

147

108

View full text Add to dashboard Cite

Algorithmic approaches to interpreting machine learning models have proliferated in recent years. We carry out human subject tests that are the first of their kind to isolate the effect of algorithmic explanations on a key aspect of model interpretability, simulatability, while avoiding important confounding experimental factors. A model is simulatable when a person can predict its behavior on new inputs. Through two kinds of simulation tests involving text and tabular data, we evaluate five explanations methods: (1) LIME, (2) Anchor, (3) Decision Boundary, (4) a Prototype model, and (5) a Composite approach that combines explanations from each method. Clear evidence of method effectiveness is found in very few cases: LIME improves simulatability in tabular classification, and our Prototype method is effective in counterfactual simulation tests. We also collect subjective ratings of explanations, but we do not find that ratings are predictive of how helpful explanations are. Our results provide the first reliable and comprehensive estimates of how explanations influence simulatability across a variety of explanation methods and data domains. We show that (1) we need to be careful about the metrics we use to evaluate explanation methods, and (2) there is significant room for improvement in current methods. 1

show abstract

Section: Evaluating Interpretabilitymentioning

confidence: 99%

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Hase¹,

Bansal²

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

147

108

View full text Add to dashboard Cite

show abstract

“…We aim at faithful explanations -the identification of the actual reason for the model's prediction, which is essential for accountability, fairness, and credibility (Chakraborty et al, 2017;Wu and Mooney, 2019) to evaluate whether a model's prediction is based on the correct evidence. The recently published ERASER benchmark (DeYoung et al, 2020) provides multiple datasets with annotated rationales, i.e., parts of the input document, which are essential for correct predictions of the target variable (Zaidan et al, 2007). By contrast to post-hoc techniques to identify relevant input parts such as LIME (Ribeiro et al, 2016) or input reduction (Feng et al, 2018), we focus on models that are faithful by design, in which the selected rationale matches the full underlying evidence used for the prediction.…”

Section: Selectmentioning

confidence: 99%

“…Existing strategies mostly rely on REINFORCE (Williams, 1992) style learning (Lei et al, 2016; or on training two disjoint models (Lehman et al, 2019;DeYoung et al, 2020), in the latter case depending on rationale supervision. This poses critical limitations as rationale annotations are costly to obtain and, in many cases, not available.…”

Section: Selectmentioning

confidence: 99%

See 1 more Smart Citation

Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision

Glockner¹,

Habernal²,

Gurevych³

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Evaluating the trustworthiness of a model's prediction is essential for differentiating between 'right for the right reasons' and 'right for the wrong reasons'. Identifying textual spans that determine the target label, known as faithful rationales, usually relies on pipeline approaches or reinforcement learning. However, such methods either require supervision and thus costly annotation of the rationales or employ non-differentiable models. We propose a differentiable training-framework to create models which output faithful rationales on a sentence level, by solely applying supervision on the target task. To achieve this, our model solves the task based on each rationale individually and learns to assign high scores to those which solved the task best. Our evaluation on three different datasets shows competitive results compared to a standard BERT blackbox while exceeding a pipeline counterpart's performance in two cases. We further exploit the transparent decision-making process of these models to prefer selecting the correct rationales by applying direct supervision, thereby boosting the performance on the rationale-level. 1

show abstract

“…How should these different approaches be compared? Several diagnostic tests have been proposed: Jain and Wallace (2019) assessed the explanatory power of attention weights by measuring their correlation with input gradients; Wiegreffe and Pinter (2019) and DeYoung et al (2020) developed more informative tests, including a combination of comprehensiveness and sufficiency metrics and the correlation with human rationales; Jacovi and Goldberg (2020) proposed a set of evaluation recommendations and a graded notion of faithfulness. Most proposed frameworks rely on correlations and counterfactual simulation, sidestepping the main practical goal of prediction explainability-the ability to communicate an explanation to a human user.…”

Section: Introductionmentioning

confidence: 99%

The Explanation Game: Towards Prediction Explainability through Sparse Communication

Treviso¹,

Martins²

2020

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier's decision. We use this framework to compare several explainers, including gradient methods, erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods in the light of classical feature selection, and use this as inspiration for new embedded explainers, through the use of selective, sparse attention. Experiments in text classification and natural language inference, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods, and show that selective attention is a simpler alternative to stochastic rationalizers. Human experiments show strong results on text classification with post-hoc explainers trained to optimize communication success.

show abstract

ERASER: A Benchmark to Evaluate Rationalized NLP Models

Cited by 319 publications

References 65 publications

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?

Why do you think that? Exploring Faithful Sentence-Level Rationales Without Supervision

The Explanation Game: Towards Prediction Explainability through Sparse Communication

Contact Info

Product

Resources

About