Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu 2018
DOI: 10.18653/v1/n18-1101
|View full text |Cite
|
Sign up to set email alerts
|

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Abstract: This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. At 433k examples, this resource is one of the largest corpora available for natural language inference (a.k.a. recognizing textual entailment), improving upon available resources in both its coverage and difficulty. MultiNLI accomplishes this by offering data from ten distinct genres of written and spoken English,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

12
2,218
1
4

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 2,501 publications
(2,235 citation statements)
references
References 29 publications
12
2,218
1
4
Order By: Relevance
“…The vocabulary size is 75K words. Table 7: Results on language inference on MultiN-LI (Williams et al, 2017), matched/mismatched scenario (MNLI1/2).…”
Section: B Implementation and Experimental Detailsmentioning
confidence: 99%
See 1 more Smart Citation
“…The vocabulary size is 75K words. Table 7: Results on language inference on MultiN-LI (Williams et al, 2017), matched/mismatched scenario (MNLI1/2).…”
Section: B Implementation and Experimental Detailsmentioning
confidence: 99%
“…We use White et al (2017)'s Unified Semantic Evaluation Framework (USEF) that recasts three semantic phenomena NLI: 1) semantic proto-roles, 2) paraphrastic inference, 3) and complex anaphora resolution. Additionally, we evaluate the NMT sentence representations on 4) Multi-NLI, a recent extension of the Stanford Natural Language Inference dataset (SNLI) (Bowman et al, 2015) that includes multiple genres and domains (Williams et al, 2017). We contextualize our results with a standard neural encoder described in Bowman et al (2015) and used in White et al (2017).…”
Section: Introductionmentioning
confidence: 99%
“…We demonstrated promising initial improvements based on multiple datasets and metrics, even when the entailment knowledge was extracted from a domain different from the summarization domain. Our next steps to this workshop paper include: (1) stronger summarization baselines, e.g., using pointer copy mechanism (See et al, 2017;Nallapati et al, 2016), and also adding this capability to the entailment generation model; (2) results on CNN/Daily Mail corpora (Nallapati et al, 2016); (3) incorporating entailment knowledge from other news-style domains such as the new Multi-NLI corpus (Williams et al, 2017), and (4) demonstrating mutual improvements on the entailment generation task.…”
Section: Conclusion and Next Stepsmentioning
confidence: 99%
“…Impor-tantly, these improvements are achieved despite the fact that the domain of the entailment dataset (image captions) is substantially different from the domain of the summarization datasets (general news), which suggests that the model is learning certain domain-independent inference skills. Our next steps to this workshop paper include incorporating stronger pointer-based models and employing the new multi-domain entailment corpus (Williams et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…Given two sentences, the first being the premise and the second the hypothesis, the goal of NLI is to train a classifier to predict whether the relation of the hypothesis to the premise is one of entailment, contradiction or a neutral relation. The training and test data for this 3-way classification task at RepEval 2017 are drawn from the Multi-Genre NLI, or MultiNLI corpus (see Williams et al (2017) for details). Task participants are provided with both training and development datasets, where parts of the development data match the training data in terms of genre, topic etc.…”
Section: Introductionmentioning
confidence: 99%