Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition

Jeretič, Paloma; Warstadt, Alex; Bhooshan, Suvrat; Williams, Adina

doi:10.18653/v1/2020.acl-main.768

Cited by 60 publications

(81 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the wide impact that large-scale NLI datasets, such as SNLI and MNLI, have had on recent progress in NLU for English, we hope that our resource will likewise help accelerate progress on Chinese NLU. In addition to making more progress on Chinese NLI, future work will also focus on using our dataset for doing Chinese model probing (e.g., building on work such as Warstadt et al (2019); ; Jeretic et al (2020)) and sentence representation learning (Reimers and Gurevych, 2019), as well as for investigating bias-reduction techniques (Clark et al, 2019;Belinkov et al, 2019; for languages other than English.…”

Section: Discussionmentioning

confidence: 99%

“…These large corpora have been used as part of larger benchmark sets, e.g., GLUE (Wang et al, 2018), and have proven useful for problems beyond NLI, such as sentence representation and transfer learning (Conneau et al, 2017;Subramanian et al, 2018;Reimers and Gurevych, 2019), automated question-answering (Khot et al, 2018;Trivedi et al, 2019) and model probing (Warstadt et al, 2019;Geiger et al, 2020;Jeretic et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

OCNLI: Original Chinese Natural Language Inference

Hu¹,

Richardson²,

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (∼12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding. To the best of our knowledge, this is the first humanelicited MNLI-style corpus for a non-English language.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

OCNLI: Original Chinese Natural Language Inference

Hu¹,

Richardson²,

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…They then used these datasets to evaluate how well a wide class of RTE models capture these phenomena. Other RTE datasets that target more specific phenomena were created using automatic methods, including Jeretic et al (2020)'s "IMPRES" diagnostic RTE dataset that tests for IMPlicatures and PRESuppositions.…”

Section: Automatically Createdmentioning

confidence: 99%

“…The GLUE and SuperGlue datasets include diagnostic sets where annotators manually labeled samples of examples as requiring a broad range of linguistic phenomena. The types of phenomena manu-Proto-Roles (White et al, 2017), Paraphrastic Inference (White et al, 2017, Event Factuality (Poliak et al, 2018b;Staliūnaitė, 2018), Anaphora Resolution (White et al, 2017Poliak et al, 2018b), Lexicosyntactic Inference (Pavlick and Callison-Burch, 2016;Poliak et al, 2018b;Glockner et al, 2018), Compositionality (Dasgupta et al, 2018), Prepositions (Kim et al, 2019), Comparatives (Kim et al, 2019;Richardson et al, 2020), Quantification/Numerical Reasoning (Naik et al, 2018;Kim et al, 2019;Richardson et al, 2020), Spatial Expressions (Kim et al, 2019), Negation (Naik et al, 2018;Kim et al, 2019;Richardson et al, 2020), Tense & Aspect (Kober et al, 2019), Veridicality (Poliak et al, 2018b;, Monotonicity (Yanaka et al, 2019(Yanaka et al, , 2020Richardson et al, 2020), Presupposition (Jeretic et al, 2020), Implicatures (Jeretic et al, 2020), Temporal Reasoning (Vashishtha et al, 2020) ally labeled include lexical semantics, predicateargument structure, logic, and common sense or world knowledge. 14…”

Section: Manually Createdmentioning

confidence: 99%

A survey on Recognizing Textual Entailment as an NLP Evaluation

Poliak¹

2020

Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

View full text Add to dashboard Cite

Recognizing Textual Entailment (RTE) was proposed as a unified evaluation framework to compare semantic understanding of different NLP systems. In this survey paper, we provide an overview of different approaches for evaluating and understanding the reasoning capabilities of NLP systems. We then focus our discussion on RTE by highlighting prominent RTE datasets as well as advances in RTE dataset that focus on specific linguistic phenomena that can be used to evaluate NLP systems on a fine-grained level. We conclude by arguing that when evaluating NLP systems, the community should utilize newly introduced RTE datasets that focus on specific linguistic phenomena.

show abstract

“…In light of the low agreements on explicit modeling of the task of complement coercion, we turn to a different crowdsourcing approach which was proven successful for many linguistic phenomena -using NLI as discussed above ( §2). NLI was used to collect data for a wide range of linguistic phenomena: Paraphrase Inference, Anaphora Resolution, Numerical Reasoning, Implicatures and more (White et al, 2017;Poliak et al, 2018;Jeretic et al, 2020;Yanaka et al, 2020;Naik et al, 2018) (see Poliak (2020)). Therefore, we take a similar approach, with similar methodologies, and make use of NLI as an evaluation setup for the complement coercion phenomenon.…”

Section: Nli For Complement Coercionmentioning

confidence: 99%

The Extraordinary Failure of Complement Coercion Crowdsourcing

Elazar¹,

Basmov²,

Ravfogel³

et al. 2020

Proceedings of the First Workshop on Insights From Negative Results in NLP

View full text Add to dashboard Cite

Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years. In this work, we follow known methodologies of collecting labeled data for the complement coercion phenomenon. These are constructions with an implied action -e.g., "I started a new book I bought last week", where the implied action is reading. We aim to collect annotated data for this phenomenon by reducing it to either of two known tasks: Explicit Completion and Natural Language Inference. However, in both cases, crowdsourcing resulted in low agreement scores, even though we followed the same methodologies as in previous work. Why does the same process fail to yield high agreement scores? We specify our modeling schemes, highlight the differences with previous work and provide some insights about the task and possible explanations for the failure. We conclude that specific phenomena require tailored solutions, not only in specialized algorithms, but also in data collection methods.

show abstract

Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition

Cited by 60 publications

References 52 publications

OCNLI: Original Chinese Natural Language Inference

OCNLI: Original Chinese Natural Language Inference

A survey on Recognizing Textual Entailment as an NLP Evaluation

The Extraordinary Failure of Complement Coercion Crowdsourcing

Contact Info

Product

Resources

About