Behavior Analysis of NLI Models: Uncovering the Influence of Three
            Factors on Robustness

Sánchez, Iván; Mitchell, Jeff; Riedel, Sebastian

doi:10.18653/v1/n18-1179

Cited by 35 publications

(28 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several datasets were constructed by modifying or extracting examples from existing datasets. For instance, Sanchez et al (2018) and Glockner et al (2018) extracted examples from SNLI (Bowman et al, 2015) and replaced specific words such as hypernyms, synonyms, and antonyms, followed by manual verification. Linzen et al (2016), on the other hand, extracted examples of subject-verb agreement from raw texts using heuristics, resulting in a large-scale dataset.…”

Section: Construction Methodsmentioning

confidence: 99%

Analysis Methods in Neural Language Processing: A Survey

Belinkov

Glass

2019

Transactions of the Association for Computational Linguistics

395

297

View full text Add to dashboard Cite

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more finegrained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

show abstract

Section: Construction Methodsmentioning

confidence: 99%

Analysis Methods in Neural Language Processing: A Survey

Belinkov

Glass

2019

Transactions of the Association for Computational Linguistics

395

297

View full text Add to dashboard Cite

show abstract

“…They showed with adversarial examples that most models can be easily tricked by modifications on the data which do not confuse humans. Similarly, Sanchez et al (2018) performed controlled experiments on the robustness of several Natural Language Inference models by altering hypernym, hyponym, and antonym relations in the data. Both studies revealed a major weakness of the models: They largely rely on pattern matching instead of human decision-making processes as required in the tasks, including heuristics (Gigerenzer and Gaissmaier, 2011) and elimination by aspects (Tversky, 1972).…”

Section: Introductionmentioning

confidence: 99%

Comparing Attention-Based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension

Blohm¹,

Jagfeld²,

Sood³

et al. 2018

Proceedings of the 22nd Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset. To investigate the limitations of our model as well as the behavioral difference between convolutional and recurrent neural networks, we generate adversarial examples to confuse the model and compare to human performance. Furthermore, we assess the generalizability of our model by analyzing its differences to human inference, drawing upon insights from cognitive science.

show abstract

“…Snorkel (Ratner et al, 2017), Hard/easy sets (Gururangan et al, 2018) Errudite Compositional-sensitivity Transformations NLPAug (Ma, 2019) Counterfactuals (Kaushik et al, 2019), Stress test (Naik et al, 2018), Bias factors (Sanchez et al, 2018) (Cooper et al, 1994), RTE (Dagan et al, 2005), SICK (Marelli et al, 2014), SNLI , MNLI (Williams et al, 2018), Checklist (Ribeiro et al, 2020) HANS (McCoy et al, 2019b), Quantified NLI (Geiger et al, 2018), MPE (Lai et al, 2017), EQUATE (Ravichander et al, 2019), DNC , ImpPres (Jeretic et al, 2020), Systematicity (Yanaka et al, 2020) ConjNLI (Saha et al, 2020), SherLIiC (Schmitt and Schütze, 2019) Example: author new movie reviews in the style of a newspaper columnist.…”

Section: Subpopulationsmentioning

confidence: 99%

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

2021

View full text Add to dashboard Cite

We present the first multi-task learning model-named PhoNLP-for joint Vietnamese part-of-speech (POS) tagging, named entity recognition (NER) and dependency parsing. Experiments on Vietnamese benchmark datasets show that PhoNLP produces state-of-the-art results, outperforming a single-task learning approach that fine-tunes the pre-trained Vietnamese language model PhoBERT (Nguyen and Nguyen, 2020) for each task independently. We publicly release PhoNLP as an open-source toolkit under the Apache License 2.0. Although we specify PhoNLP for Vietnamese, our PhoNLP training and evaluation command scripts in fact can directly work for other languages that have a pre-trained BERT-based language model and gold annotated corpora available for the three tasks of POS tagging, NER and dependency parsing. We hope that PhoNLP can serve as a strong baseline and useful toolkit for future NLP research and applications to not only Vietnamese but also the other languages. Our PhoNLP is available at https://github. com/VinAIResearch/PhoNLP.

show abstract

Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness

Cited by 35 publications

References 16 publications

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Comparing Attention-Based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

Contact Info

Product

Resources

About