Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1073
|View full text |Cite
|
Sign up to set email alerts
|

Errudite: Scalable, Reproducible, and Testable Error Analysis

Abstract: Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domainspecific language. Second,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
62
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 104 publications
(62 citation statements)
references
References 32 publications
0
62
0
Order By: Relevance
“…A similar analysis on DROP shows that MTMSN does substantially worse on event re-ordering (47.3 F 1 ) than on adding compositional reasoning steps (67.5 F 1 ). We recommend authors categorize their perturbations up front in order to simplify future analyses and bypass some of the pitfalls of post-hoc error categorization (Wu et al, 2019). Additionally, it's worth discussing the dependency parsing result.…”
Section: Fine-grained Analysis Of Contrast Setsmentioning
confidence: 99%
“…A similar analysis on DROP shows that MTMSN does substantially worse on event re-ordering (47.3 F 1 ) than on adding compositional reasoning steps (67.5 F 1 ). We recommend authors categorize their perturbations up front in order to simplify future analyses and bypass some of the pitfalls of post-hoc error categorization (Wu et al, 2019). Additionally, it's worth discussing the dependency parsing result.…”
Section: Fine-grained Analysis Of Contrast Setsmentioning
confidence: 99%
“…The grouping ensures that we do not mistakenly prioritize groups that are actually well-handled on average. We follow the approach proposed by Wu et al (2019), and extend their Errudite framework 10 to the relation extraction task. After formulating a hypothesis, we assess the error prevalence over the entire dataset split to validate whether the hypothesis holds, i.e.…”
Section: Error Hypotheses Formulation and Adversarial Rewritingmentioning
confidence: 99%
“…in intermediate layers. More similar to our approach is rewriting of instances (Jia and Liang, 2017;Ribeiro et al, 2018) but instead of evaluating model robustness we use rewriting to test explicit error hypotheses, similar to Wu et al (2019).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In contrast to those two tools, our tool offers visualizations of a variety of statistically interesting aspects of data splits in order to better understand model behaviours. Wu et al (2019) provide an interactive tool for error analysis called ERRUDITE. 6 It supports, i.a., automated counterfactual rewriting for testing hypotheses about errors.…”
Section: Tools For Analyzing Nlp Modelsmentioning
confidence: 99%