Probing for semantic evidence of composition by means of simple
            classification tasks

Ettinger, Allyson; Elgohary, Ahmed; Resnik, Philip

doi:10.18653/v1/w16-2524

Cited by 122 publications

(107 citation statements)

References 16 publications

Supporting

Mentioning

105

Contrasting

Order By: Relevance

“…An active line of work focuses on "probing" neural representations of language. Ettinger et al (2016Ettinger et al ( , 2017; Zhu et al (2018), i.a., use a task-based approach similar to ours, where tasks that require a specific subset of linguistic knowledge are used to perform qualitative evaluation. Gulordava et al (2018), Giulianelli et al (2018), Rønning et al (2018), and Jumelet and Hupkes (2018) make a focused contribution towards a particular linguistic phenomenon (agreement, ellipsis, negative polarity).…”

Section: Related Workmentioning

confidence: 99%

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Kim¹,

Patel²,

Poliak³

et al. 2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

We introduce a set of nine challenge tasks that test for the understanding of function words. These tasks are created by structurally mutating sentences from existing datasets to target the comprehension of specific types of function words (e.g., prepositions, wh-words). Using these probing tasks, we explore the effects of various pretraining objectives for sentence encoders (e.g., language modeling, CCG supertagging and natural language inference (NLI)) on the learned representations. Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably. Overall, no pretraining objective dominates across the board, and our function word probing tasks highlight several intuitive differences between pretraining objectives, e.g., that NLI helps the comprehension of negation.

show abstract

Section: Related Workmentioning

confidence: 99%

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Kim¹,

Patel²,

Poliak³

et al. 2019

Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

View full text Add to dashboard Cite

show abstract

“…Many of them employ the Transformer architecture (Vaswani et al, 2017) that uses multi-head self-attention to capture context. To assess the linguistic knowledge learned by pre-trained LMs, probing task methodology suggest training supervised models on top of the word representations (Ettinger et al, 2016;Hupkes et al, 2018;Belinkov and Glass, 2019;Hewitt and Liang, 2019). Investigated linguistic aspects span across morphology (Shi et al, 2016;Belinkov et al, 2017;Liu et al, 2019a), syntax (Tenney et al, 2019;Hewitt and Manning, 2019), and semantics (Conneau et al, 2018;Liu et al, 2019a).…”

Section: Related Workmentioning

confidence: 99%

Context Analysis for Pre-trained Masked Language Models

Lai¹,

Lalwani²,

Zhang³

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Pre-trained language models that learn contextualized word representations from a large unannotated corpus have become a standard component for many state-of-the-art NLP systems. Despite their successful applications in various downstream NLP tasks, the extent of contextual impact on the word representation has not been explored. In this paper, we present a detailed analysis of contextual impact in Transformer-and BiLSTM-based masked language models. We follow two different approaches to evaluate the impact of context: a masking based approach that is architecture agnostic, and a gradient based approach that requires back-propagation through networks. The findings suggest significant differences on the contextual impact between the two model architectures. Through further breakdown of analysis by syntactic categories, we find the contextual impact in Transformer-based MLM aligns well with linguistic intuition. We further explore the Transformer attention pruning based on our findings in contextual analysis.

show abstract

“…We show that selectivity can be a guide in designing probes and interpreting probing results, complementary to random representation baselines; as of now, there is little consensus on how to design probes. Early probing papers used linear functions (Shi et al, 2016;Ettinger et al, 2016;Alain and Bengio, 2016), which are still used (Bisazza and Tump, 2018;Liu et al, 2019), but multi-layer perceptron (MLP) probes are at least as popular Conneau et al, 2018;Adi et al, 2017;Ettinger et al, 2018). Arguments have been made for "simple" probes, e.g., that we want to find easily accessible information in a representation (Liu et al, 2019;Alain and Bengio, 2016).…”

Section: Introductionmentioning

confidence: 99%

Designing and Interpreting Probes with Control Tasks

Hewitt

Liang

2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

364

436

View full text Add to dashboard Cite

Probes, supervised models trained to predict properties (like parts-of-speech) from representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But does this mean that the representations encode linguistic structure or just that the probe has learned the linguistic task? In this paper, we propose control tasks, which associate word types with random outputs, to complement linguistic tasks. By construction, these tasks can only be learned by the probe itself. So a good probe, (one that reflects the representation), should be selective, achieving high linguistic task accuracy and low control task accuracy. The selectivity of a probe puts linguistic task accuracy in context with the probe's capacity to memorize from word types. We construct control tasks for English part-of-speech tagging and dependency edge prediction, and show that popular probes on ELMo representations are not selective. We also find that dropout, commonly used to control probe complexity, is ineffective for improving selectivity of MLPs, but that other forms of regularization are effective. Finally, we find that while probes on the first layer of ELMo yield slightly better part-of-speech tagging accuracy than the second, probes on the second layer are substantially more selective, which raises the question of which layer better represents parts-of-speech.

show abstract

Probing for semantic evidence of composition by means of simple classification tasks

Cited by 122 publications

References 16 publications

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

Context Analysis for Pre-trained Masked Language Models

Designing and Interpreting Probes with Control Tasks

Contact Info

Product

Resources

About