Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.768
|View full text |Cite
|
Sign up to set email alerts
|

Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition

Abstract: Natural language inference (NLI) is an increasingly important task for natural language understanding, which requires one to infer whether a sentence entails another. However, the ability of NLI models to make pragmatic inferences remains understudied. We create an IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types. We use IMPPRES to evaluate whether BERT, InferSent, and BOW NLI models tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
45
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(81 citation statements)
references
References 52 publications
2
45
0
Order By: Relevance
“…Given the wide impact that large-scale NLI datasets, such as SNLI and MNLI, have had on recent progress in NLU for English, we hope that our resource will likewise help accelerate progress on Chinese NLU. In addition to making more progress on Chinese NLI, future work will also focus on using our dataset for doing Chinese model probing (e.g., building on work such as Warstadt et al (2019); ; Jeretic et al (2020)) and sentence representation learning (Reimers and Gurevych, 2019), as well as for investigating bias-reduction techniques (Clark et al, 2019;Belinkov et al, 2019; for languages other than English.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Given the wide impact that large-scale NLI datasets, such as SNLI and MNLI, have had on recent progress in NLU for English, we hope that our resource will likewise help accelerate progress on Chinese NLU. In addition to making more progress on Chinese NLI, future work will also focus on using our dataset for doing Chinese model probing (e.g., building on work such as Warstadt et al (2019); ; Jeretic et al (2020)) and sentence representation learning (Reimers and Gurevych, 2019), as well as for investigating bias-reduction techniques (Clark et al, 2019;Belinkov et al, 2019; for languages other than English.…”
Section: Discussionmentioning
confidence: 99%
“…These large corpora have been used as part of larger benchmark sets, e.g., GLUE (Wang et al, 2018), and have proven useful for problems beyond NLI, such as sentence representation and transfer learning (Conneau et al, 2017;Subramanian et al, 2018;Reimers and Gurevych, 2019), automated question-answering (Khot et al, 2018;Trivedi et al, 2019) and model probing (Warstadt et al, 2019;Geiger et al, 2020;Jeretic et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…They then used these datasets to evaluate how well a wide class of RTE models capture these phenomena. Other RTE datasets that target more specific phenomena were created using automatic methods, including Jeretic et al (2020)'s "IMPRES" diagnostic RTE dataset that tests for IMPlicatures and PRESuppositions.…”
Section: Automatically Createdmentioning
confidence: 99%
“…The GLUE and SuperGlue datasets include diagnostic sets where annotators manually labeled samples of examples as requiring a broad range of linguistic phenomena. The types of phenomena manu-Proto-Roles (White et al, 2017), Paraphrastic Inference (White et al, 2017, Event Factuality (Poliak et al, 2018b;Staliūnaitė, 2018), Anaphora Resolution (White et al, 2017Poliak et al, 2018b), Lexicosyntactic Inference (Pavlick and Callison-Burch, 2016;Poliak et al, 2018b;Glockner et al, 2018), Compositionality (Dasgupta et al, 2018), Prepositions (Kim et al, 2019), Comparatives (Kim et al, 2019;Richardson et al, 2020), Quantification/Numerical Reasoning (Naik et al, 2018;Kim et al, 2019;Richardson et al, 2020), Spatial Expressions (Kim et al, 2019), Negation (Naik et al, 2018;Kim et al, 2019;Richardson et al, 2020), Tense & Aspect (Kober et al, 2019), Veridicality (Poliak et al, 2018b;, Monotonicity (Yanaka et al, 2019(Yanaka et al, , 2020Richardson et al, 2020), Presupposition (Jeretic et al, 2020), Implicatures (Jeretic et al, 2020), Temporal Reasoning (Vashishtha et al, 2020) ally labeled include lexical semantics, predicateargument structure, logic, and common sense or world knowledge. 14…”
Section: Manually Createdmentioning
confidence: 99%
“…In light of the low agreements on explicit modeling of the task of complement coercion, we turn to a different crowdsourcing approach which was proven successful for many linguistic phenomena -using NLI as discussed above ( §2). NLI was used to collect data for a wide range of linguistic phenomena: Paraphrase Inference, Anaphora Resolution, Numerical Reasoning, Implicatures and more (White et al, 2017;Poliak et al, 2018;Jeretic et al, 2020;Yanaka et al, 2020;Naik et al, 2018) (see Poliak (2020)). Therefore, we take a similar approach, with similar methodologies, and make use of NLI as an evaluation setup for the complement coercion phenomenon.…”
Section: Nli For Complement Coercionmentioning
confidence: 99%