2022
DOI: 10.48550/arxiv.2205.11097
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Fine-grained Interpretability Evaluation Benchmark for Neural NLP

Abstract: While there is increasing concern about the interpretability of neural models, the evaluation of interpretability remains an open problem, due to the lack of proper evaluation datasets and metrics. In this paper, we present a novel benchmark to evaluate the interpretability of both neural models and saliency methods. This benchmark covers three representative NLP tasks: sentiment analysis, textual similarity and reading comprehension, each provided with both English and Chinese annotated data. In order to prec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 31 publications
0
1
0
Order By: Relevance
“…The IG is an attempt to assign an attribution value to each input feature which measures the extent to which an input contributes to the final prediction [12]. A recent study is carried out to set a benchmark over three representative NLP tasks (sentiment analysis, textual similarity and reading comprehension) for interpretability of both neural models and saliency methods [14] thereby emphasizing the need of LIME and IG for downstream NLP tasks.…”
Section: Introductionmentioning
confidence: 99%
“…The IG is an attempt to assign an attribution value to each input feature which measures the extent to which an input contributes to the final prediction [12]. A recent study is carried out to set a benchmark over three representative NLP tasks (sentiment analysis, textual similarity and reading comprehension) for interpretability of both neural models and saliency methods [14] thereby emphasizing the need of LIME and IG for downstream NLP tasks.…”
Section: Introductionmentioning
confidence: 99%