Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1534
|View full text |Cite
|
Sign up to set email alerts
|

When data permutations are pathological: the case of neural natural language inference

Abstract: Consider two competitive machine learning models, one of which was considered state-of-the art, and the other a competitive baseline. Suppose that by just permuting the examples of the training set, say by reversing the original order, by shuffling, or by mini-batching, you could report substantially better/worst performance for the system of your choice, by multiple percentage points. In this paper, we illustrate this scenario for a trending NLP task: Natural Language Inference (NLI). We show that for the two… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(9 citation statements)
references
References 6 publications
0
9
0
Order By: Relevance
“…These techniques were until recently quite rare in this field, despite the inherently repeatable nature of most natural language processing experiments. Researchers attempting replications or reproductions have reported problems with availability of data (Mieskes, 2017;Wieling et al, 2018) and software (Pedersen, 2008), and various details of implementation (Fokkens et al, 2013;Reimers and Gurevych, 2017;Schluter and Varab, 2018). While we cannot completely avoid these pitfalls, we select a task-English part-ofspeech tagging-for which both data and software are abundantly available.…”
Section: Replication and Reproductionmentioning
confidence: 99%
“…These techniques were until recently quite rare in this field, despite the inherently repeatable nature of most natural language processing experiments. Researchers attempting replications or reproductions have reported problems with availability of data (Mieskes, 2017;Wieling et al, 2018) and software (Pedersen, 2008), and various details of implementation (Fokkens et al, 2013;Reimers and Gurevych, 2017;Schluter and Varab, 2018). While we cannot completely avoid these pitfalls, we select a task-English part-ofspeech tagging-for which both data and software are abundantly available.…”
Section: Replication and Reproductionmentioning
confidence: 99%
“…However, they are known for being "black boxes" which are not easily interpretable. Recent interest in interpreting these methods has led to new lines of research which attempt to discover what linguistic phenomena neural networks are able to learn (Linzen et al, 2016;Gulordava et al, 2018;Conneau et al, 2018), how robust neural networks are to perturbations in input data (Ribeiro et al, 2018;Ebrahimi et al, 2018;Schluter and Varab, 2018), and what biases they propagate (Park et al, 2018;Zhao et al, 2018;Kiritchenko and Mohammad, 2018).…”
Section: Introductionmentioning
confidence: 99%
“…The SNLI data set is a text implication recognition data set published by Stanford University. SNLI is manually annotated and contains 570 k text pairs, used as testing and training sets for NLI systems [25][26][27][28]. There are three kinds of marks: implication, contradiction, and neutral.…”
Section: Snli Datasetmentioning
confidence: 99%