DirectProbe: Studying Representations without Classifiers

Zhou, Yichu; Srikumar, Vivek

doi:10.18653/v1/2021.naacl-main.401

Cited by 17 publications

(31 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that there are also many probing papers without post-hoc classifiers (Zhou and Srikumar, 2021;Torroba Hennigen et al, 2020;Li et al, 2021). While many of these do not mention the term "probing", they nevertheless probe the intrinsics of deep neural models.…”

Section: Probing Methodsmentioning

confidence: 99%

On the data requirements of probing

Zhu¹,

Wang²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

As large and powerful neural language models are developed, researchers have been increasingly interested in developing diagnostic tools to probe them. There are many papers with conclusions of the form "observation X is found in model Y ", using their own datasets with varying sizes. Larger probing datasets bring more reliability, but are also expensive to collect. There is yet to be a quantitative method for estimating reasonable probing dataset sizes. We tackle this omission in the context of comparing two probing configurations: after we have collected a small dataset from a pilot study, how many additional data samples are sufficient to distinguish two different configurations? We present a novel method to estimate the required number of data samples in such experiments and, across several case studies, we verify that our estimations have sufficient statistical power. Our framework helps to systematically construct probing datasets to diagnose neural NLP models.

show abstract

Section: Probing Methodsmentioning

confidence: 99%

On the data requirements of probing

Zhu¹,

Wang²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In this work, we will probe representations in the BERT family at various points during and after fine-tuning. As a first step, let us look at the two supervised probes we will employ: a classifier-based probe to assess how well a representation supports classifiers for different tasks, and DIRECTPROBE (Zhou and Srikumar, 2021) to analyze the geometry of the representation.…”

Section: Preliminaries: Probing Methodsmentioning

confidence: 99%

“…There are also efforts to inspect the representations from a geometric persepctive (e.g. Ethayarajh, 2019;Mimno and Thompson, 2017), including the recently proposed DIRECTPROBE (Zhou and Srikumar, 2021), which we use in this work. Another line of probing work is to design control tasks (Ravichander et al, 2021;Lan et al, 2020) to reverse-engineer the internal mechanisms of representations (Kovaleva et al, 2019;.…”

Section: Related Workmentioning

confidence: 99%

“…How does fine-tuning change the underlying geometric structure that the representation uses to encode lexical items? We apply two probing techniques, classifier-based probing and DIRECTPROBE (Zhou and Srikumar, 2021), on variants of BERT representations that are fine-tuned on four NLP tasks: part-of-speech tagging, dependency head prediction, and preposition supersense role & function prediction.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Closer Look at How Fine-tuning Changes BERT

Zhou¹,

Srikumar²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Given the prevalence of pre-trained contextualized representations in today's NLP, there have been several efforts to understand what information such representations contain. A common strategy to use such representations is to fine-tune them for an end task. However, how fine-tuning for a task changes the underlying space is less studied. In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space. Our experiments reveal that fine-tuning improves performance because it pushes points associated with a label away from other labels. By comparing the representations before and after fine-tuning, we also discover that fine-tuning does not change the representations arbitrarily; instead, it adjusts the representations to downstream tasks while preserving the original structure. Finally, using carefully constructed experiments, we show that fine-tuning can encode training sets in a representation, suggesting an overfitting problem of a new kind.

show abstract

“…Recently, several studies have focused on the remarkable potential of pre-trained language models, such as BERT (Devlin et al, 2019), in capturing linguistic knowledge. They have shown that pretrained representations are able to encode various linguistic properties (Tenney et al, 2019a;Talmor et al, 2020;Goodwin et al, 2020;Wu et al, 2020;Zhou and Srikumar, 2021;Chen et al, 2021;Tenney et al, 2019b), among others, syntactic, such as part of speech (Liu et al, 2019a) and dependency tree (Hewitt and Manning, 2019), and semantic, such as word senses (Reif et al, 2019) and semantic dependency (Wu et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy

Rajaee¹,

Pilehvar²

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

It is widely accepted that fine-tuning pretrained language models usually brings about performance improvements in downstream tasks. However, there are limited studies on the reasons behind this effectiveness, particularly from the viewpoint of structural changes in the embedding space. Trying to fill this gap, in this paper, we analyze the extent to which the isotropy of the embedding space changes after fine-tuning. We demonstrate that, even though isotropy is a desirable geometrical property, fine-tuning does not necessarily result in isotropy enhancements. Moreover, local structures in pre-trained contextual word representations (CWRs), such as those encoding token types or frequency, undergo a massive change during fine-tuning. Our experiments show dramatic growth in the number of elongated directions in the embedding space, which, in contrast to pre-trained CWRs, carry the essential linguistic knowledge in the fine-tuned embedding space, making existing isotropy enhancement methods ineffective.

show abstract

DirectProbe: Studying Representations without Classifiers

Cited by 17 publications

References 52 publications

On the data requirements of probing

On the data requirements of probing

A Closer Look at How Fine-tuning Changes BERT

How Does Fine-tuning Affect the Geometry of Embedding Space: A Case Study on Isotropy

Contact Info

Product

Resources

About