Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2018
DOI: 10.18653/v1/p18-2103
|View full text |Cite
|
Sign up to set email alerts
|

Breaking NLI Systems with Sentences that Require Simple Lexical Inferences

Abstract: We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge. The new examples are simpler than the SNLI test set, containing sentences that differ by at most one word from sentences in the training set. Yet, the performance on the new test set is substantially worse across systems trained on SNLI, demonstrating that these systems are limited in their generalization ability, failing to capture many simple inferences. IntroductionRecogn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

9
318
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 301 publications
(328 citation statements)
references
References 19 publications
9
318
1
Order By: Relevance
“…This contrasts with past work on adversarial examples (e.g. Jia and Liang, 2017;Glockner et al, 2018;Belinkov and Bisk, 2018) which consider cases where an out-of-distribution test set is constructed to be adversarial.…”
Section: Introductionmentioning
confidence: 78%
“…This contrasts with past work on adversarial examples (e.g. Jia and Liang, 2017;Glockner et al, 2018;Belinkov and Bisk, 2018) which consider cases where an out-of-distribution test set is constructed to be adversarial.…”
Section: Introductionmentioning
confidence: 78%
“…These datasets contain downward inferences, but they are designed not to require lexical knowledge. There are also NLI datasets which expand lexical knowledge by replacing words using lexical rules (Monz and de Rijke, 2001;Glockner et al, 2018;Naik et al, 2018;Poliak et al, 2018a). In these works, however, little attention has been paid to downward inferences.…”
Section: Upwardmentioning
confidence: 99%
“…A. Glockner et al (2018) This dataset is created by modifying SNLI examples with single word replacements of different lexical relations, based on WordNet. It tests lexical inferences and relatively simple world knowledge.…”
Section: Overall Results and Analysismentioning
confidence: 99%