A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Williams, Adina; Nangia, Nikita; Bowman, Samuel R.

doi:10.18653/v1/n18-1101

Cited by 2,501 publications

(2,235 citation statements)

References 29 publications

Supporting

Mentioning

2,218

Contrasting

Unclassified

Order By: Relevance

“…The vocabulary size is 75K words. Table 7: Results on language inference on MultiN-LI (Williams et al, 2017), matched/mismatched scenario (MNLI1/2).…”

Section: B Implementation and Experimental Detailsmentioning

confidence: 99%

“…We use White et al (2017)'s Unified Semantic Evaluation Framework (USEF) that recasts three semantic phenomena NLI: 1) semantic proto-roles, 2) paraphrastic inference, 3) and complex anaphora resolution. Additionally, we evaluate the NMT sentence representations on 4) Multi-NLI, a recent extension of the Stanford Natural Language Inference dataset (SNLI) (Bowman et al, 2015) that includes multiple genres and domains (Williams et al, 2017). We contextualize our results with a standard neural encoder described in Bowman et al (2015) and used in White et al (2017).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

Poliak¹,

Belinkov²,

Glass³

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

We propose a process for investigating the extent to which sentence representations arising from neural machine translation (NMT) systems encode distinct semantic phenomena. We use these representations as features to train a natural language inference (NLI) classifier based on datasets recast from existing semantic annotations. In applying this process to a representative NMT system, we find its encoder appears most suited to supporting inferences at the syntax-semantics interface, as compared to anaphora resolution requiring worldknowledge. We conclude with a discussion on the merits and potential deficiencies of the existing process, and how it may be improved and extended as a broader framework for evaluating semantic coverage.

show abstract

“…The vocabulary size is 75K words. Table 7: Results on language inference on MultiN-LI (Williams et al, 2017), matched/mismatched scenario (MNLI1/2).…”

Section: B Implementation and Experimental Detailsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

Poliak¹,

Belinkov²,

Glass³

et al. 2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

show abstract

“…We demonstrated promising initial improvements based on multiple datasets and metrics, even when the entailment knowledge was extracted from a domain different from the summarization domain. Our next steps to this workshop paper include: (1) stronger summarization baselines, e.g., using pointer copy mechanism (See et al, 2017;Nallapati et al, 2016), and also adding this capability to the entailment generation model; (2) results on CNN/Daily Mail corpora (Nallapati et al, 2016); (3) incorporating entailment knowledge from other news-style domains such as the new Multi-NLI corpus (Williams et al, 2017), and (4) demonstrating mutual improvements on the entailment generation task.…”

Section: Conclusion and Next Stepsmentioning

confidence: 99%

“…Impor-tantly, these improvements are achieved despite the fact that the domain of the entailment dataset (image captions) is substantially different from the domain of the summarization datasets (general news), which suggests that the model is learning certain domain-independent inference skills. Our next steps to this workshop paper include incorporating stronger pointer-based models and employing the new multi-domain entailment corpus (Williams et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Towards Improving Abstractive Summarization via Entailment Generation

Pasunuru

Guo

Bansal

2017

Proceedings of the Workshop on New Frontiers in Summarization

View full text Add to dashboard Cite

Abstractive summarization, the task of rewriting and compressing a document into a short summary, has achieved considerable success with neural sequence-tosequence models. However, these models can still benefit from stronger natural language inference skills, since a correct summary is logically entailed by the input document, i.e., it should not contain any contradictory or unrelated information. We incorporate such knowledge into an abstractive summarization model via multi-task learning, where we share its decoder parameters with those of an entailment generation model. We achieve promising initial improvements based on multiple metrics and datasets (including a test-only setting). The domain mismatch between the entailment (captions) and summarization (news) datasets suggests that the model is learning some domain-agnostic inference skills.

show abstract

“…Given two sentences, the first being the premise and the second the hypothesis, the goal of NLI is to train a classifier to predict whether the relation of the hypothesis to the premise is one of entailment, contradiction or a neutral relation. The training and test data for this 3-way classification task at RepEval 2017 are drawn from the Multi-Genre NLI, or MultiNLI corpus (see Williams et al (2017) for details). Task participants are provided with both training and development datasets, where parts of the development data match the training data in terms of genre, topic etc.…”

Section: Introductionmentioning

confidence: 99%

LCT-MALTA's Submission to RepEval 2017 Shared Task

Vu¹,

Pham²,

Bai³

et al. 2017

Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP

View full text Add to dashboard Cite

We present in this paper our team LCT-MALTA's submission to the RepEval 2017 Shared Task on natural language inference. Our system is a simple system based on a standard BiLSTM architecture, using as input GloVe word embeddings augmented with further linguistic information. We use max pooling on the BiLSTM outputs to obtain embeddings for sentences. On both the matched and the mismatched test sets, our system clearly beats the shared task's BiLSTM baseline model.

show abstract

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Cited by 2,501 publications

References 29 publications

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference

Towards Improving Abstractive Summarization via Entailment Generation

LCT-MALTA's Submission to RepEval 2017 Shared Task

Contact Info

Product

Resources

About