Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1179
|View full text |Cite
|
Sign up to set email alerts
|

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness

Abstract: Natural Language Inference is a challenging task that has received substantial attention, and state-of-the-art models now achieve impressive test set performance in the form of accuracy scores. Here, we go beyond this single evaluation metric to examine robustness to semantically-valid alterations to the input data. We identify three factors -insensitivity, polarity and unseen pairs -and compare their impact on three SNLI models under a variety of conditions. Our results demonstrate a number of strengths and w… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
27
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 35 publications
(28 citation statements)
references
References 16 publications
1
27
0
Order By: Relevance
“…Several datasets were constructed by modifying or extracting examples from existing datasets. For instance, Sanchez et al (2018) and Glockner et al (2018) extracted examples from SNLI (Bowman et al, 2015) and replaced specific words such as hypernyms, synonyms, and antonyms, followed by manual verification. Linzen et al (2016), on the other hand, extracted examples of subject-verb agreement from raw texts using heuristics, resulting in a large-scale dataset.…”
Section: Construction Methodsmentioning
confidence: 99%
“…Several datasets were constructed by modifying or extracting examples from existing datasets. For instance, Sanchez et al (2018) and Glockner et al (2018) extracted examples from SNLI (Bowman et al, 2015) and replaced specific words such as hypernyms, synonyms, and antonyms, followed by manual verification. Linzen et al (2016), on the other hand, extracted examples of subject-verb agreement from raw texts using heuristics, resulting in a large-scale dataset.…”
Section: Construction Methodsmentioning
confidence: 99%
“…They showed with adversarial examples that most models can be easily tricked by modifications on the data which do not confuse humans. Similarly, Sanchez et al (2018) performed controlled experiments on the robustness of several Natural Language Inference models by altering hypernym, hyponym, and antonym relations in the data. Both studies revealed a major weakness of the models: They largely rely on pattern matching instead of human decision-making processes as required in the tasks, including heuristics (Gigerenzer and Gaissmaier, 2011) and elimination by aspects (Tversky, 1972).…”
Section: Introductionmentioning
confidence: 99%
“…Snorkel (Ratner et al, 2017), Hard/easy sets (Gururangan et al, 2018) Errudite Compositional-sensitivity Transformations NLPAug (Ma, 2019) Counterfactuals (Kaushik et al, 2019), Stress test (Naik et al, 2018), Bias factors (Sanchez et al, 2018) (Cooper et al, 1994), RTE (Dagan et al, 2005), SICK (Marelli et al, 2014), SNLI , MNLI (Williams et al, 2018), Checklist (Ribeiro et al, 2020) HANS (McCoy et al, 2019b), Quantified NLI (Geiger et al, 2018), MPE (Lai et al, 2017), EQUATE (Ravichander et al, 2019), DNC , ImpPres (Jeretic et al, 2020), Systematicity (Yanaka et al, 2020) ConjNLI (Saha et al, 2020), SherLIiC (Schmitt and Schütze, 2019) Example: author new movie reviews in the style of a newspaper columnist.…”
Section: Subpopulationsmentioning
confidence: 99%