Enhanced LSTM for Natural Language Inference

Chen, Qian; Zhu, Xiaodan; Ling, Zhen-Hua; Wei, Si; Jiang, Hui; Inkpen, Diana

doi:10.18653/v1/p17-1152

Cited by 931 publications

(858 citation statements)

References 28 publications

Supporting

Mentioning

825

Contrasting

Unclassified

Order By: Relevance

“…Enhanced Sequential Information Model ESIM (Chen et al, 2017) performs inference in three stages. First, Input Encoding uses BiLSTMs to produce representations of each word in its context within premise or hypothesis.…”

Section: Modelsmentioning

confidence: 99%

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness

Sánchez¹,

Mitchell²,

Riedel³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Natural Language Inference is a challenging task that has received substantial attention, and state-of-the-art models now achieve impressive test set performance in the form of accuracy scores. Here, we go beyond this single evaluation metric to examine robustness to semantically-valid alterations to the input data. We identify three factors -insensitivity, polarity and unseen pairs -and compare their impact on three SNLI models under a variety of conditions. Our results demonstrate a number of strengths and weaknesses in the models' ability to generalise to new in-domain instances. In particular, while strong performance is possible on unseen hypernyms, unseen antonyms are more challenging for all the models. More generally, the models suffer from an insensitivity to certain small but semantically significant alterations, and are also often influenced by simple statistical correlations between words and training labels. Overall, we show that evaluations of NLI models can benefit from studying the influence of factors intrinsic to the models or found in the dataset used.

show abstract

Section: Modelsmentioning

confidence: 99%

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness

Sánchez¹,

Mitchell²,

Riedel³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

show abstract

“…As the only large human-annotated corpus for NLI currently available, the Stanford NLI Corpus (SNLI; Bowman et al, 2015) has enabled a good deal of progress on NLU, serving as a major benchmark for machine learning work on sentence understanding and spurring work on core representation learning techniques for NLU, such as attention (Wang and Jiang, 2016;Parikh et al, 2016), memory (Munkhdalai and Yu, 2017), and the use of parse structure (Mou et al, 2016b;Bowman et al, 2016;Chen et al, 2017). However, SNLI falls short of providing a sufficient testing ground for machine learning models in two ways.…”

Section: Introductionmentioning

confidence: 99%

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Williams

Nangia

Bowman

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

2,497

1,993

View full text Add to dashboard Cite

This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. At 433k examples, this resource is one of the largest corpora available for natural language inference (a.k.a. recognizing textual entailment), improving upon available resources in both its coverage and difficulty. MultiNLI accomplishes this by offering data from ten distinct genres of written and spoken English, making it possible to evaluate systems on nearly the full complexity of the language, while supplying an explicit setting for evaluating cross-genre domain adaptation. In addition, an evaluation using existing machine learning models designed for the Stanford NLI corpus shows that it represents a substantially more difficult task than does that corpus, despite the two showing similar levels of inter-annotator agreement.

show abstract

“…For example, Part-Of-Speech (POS) tags are used for syntactic parsers. The parsers are used to improve higher-level tasks, such as natural language inference (Chen et al, 2016) and machine translation (Eriguchi et al, 2016). These systems are often pipelines and not trained end-to-end.…”

Section: Introductionmentioning

confidence: 99%

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Hashimoto¹,

Xiong²,

Tsuruoka³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

464

334

View full text Add to dashboard Cite

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task's loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

show abstract

Enhanced LSTM for Natural Language Inference

Cited by 931 publications

References 28 publications

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Contact Info

Product

Resources

About