Pareto Probing: Trading Off Accuracy for Complexity

Pimentel, Thiago Duarte; Saphra, Naomi; Williams, Adina; Cotterell, Ryan

doi:10.18653/v1/2020.emnlp-main.254

Cited by 46 publications

(69 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other works take an informationtheoretic view: Voita and Titov (2020) measure the complexity of the probe in terms of the bits needed to transmit its parameters, while Pimentel et al (2020b) argue that probing should measure mutual information between the representation and the property. Pimentel et al (2020a) propose a Pareto approach where they plot accuracy versus probe complexity, unifying several of these goals. We use these proposed metrics to compare our probing method to standard probing approaches.…”

Section: Related Workmentioning

confidence: 99%

“…We first vary the complexity of each probe, where for subnetwork probing we associate multiple encoder weights with a single mask, 4 and for the MLP probe we restrict the rank of the hidden layer. We then plot the resulting accuracy-complexity curve (Pimentel et al, 2020a).…”

Section: Probe Evaluationmentioning

confidence: 99%

“…First, we find that the neuron subnetwork probe has both higher accuracy on pre-trained models and lower accuracy on random models, so it is both better at finding properties of interest and less able to learn the tasks on its own. Next, we measure complexity as the bits needed to transmit the probe parameters (Pimentel et al, 2020a;Voita and Titov, 2020). Varying the complexity of each probe, we find that subnetwork probing Pareto-dominates MLP probing in that it achieves higher accuracy given any desired complexity.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Low-Complexity Probing via Finding Subnetworks

Cao

Sanh²,

Rushton

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

The dominant approach in probing neural networks for linguistic properties is to train a new shallow multi-layer perceptron (MLP) on top of the model's internal representations. This approach can detect properties encoded in the model, but at the cost of adding new parameters that may learn the task directly. We instead propose a subtractive pruning-based probe, where we find an existing subnetwork that performs the linguistic task of interest. Compared to an MLP, the subnetwork probe achieves both higher accuracy on pre-trained models and lower accuracy on random models, so it is both better at finding properties of interest and worse at learning on its own. Next, by varying the complexity of each probe, we show that subnetwork probing Pareto-dominates MLP probing in that it achieves higher accuracy given any budget of probe complexity. Finally, we analyze the resulting subnetworks across various tasks to locate where each task is encoded, and we find that lower-level tasks are captured in lower layers, reproducing similar findings in past work.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Probe Evaluationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Low-Complexity Probing via Finding Subnetworks

Cao

Sanh²,

Rushton

2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

show abstract

“…Early probing studies in NLP include (Zhang and Bowman, 2018) and (Tenney et al, 2019c), the former being an early example of the importance of comparing with randomized representations or labels. Further discussion has introduced control tasks and the selectivity metric (Hewitt and Liang, 2019), formalized notions of ease of extraction (Voita and Titov, 2020) and described other strategies for taking model complexity into account (Pimentel et al, 2020a).…”

Section: Related Workmentioning

confidence: 99%

Does My Representation Capture X? Probe-Ably

Ferreira¹,

Rozanova²,

Thayaparan³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Probing (or diagnostic classification) has become a popular strategy for investigating whether a given set of intermediate features is present in the representations of neural models. Probing studies may have misleading results, but various recent works have suggested more reliable methodologies that compensate for the possible pitfalls of probing. However, these best practices are numerous and fast-evolving. To simplify the process of running a set of probing experiments in line with suggested methodologies, we introduce Probe-Ably: an extendable probing framework which supports and automates the application of probing methods to the user's inputs.

show abstract

“…Contours belong to our probe. Conneau et al, 2018) is one prominent method, which consists of using a lightly parameterized model to predict linguistic phenomena from intermediate representations, albeit recent work has raised concerns on how model parameterization and evaluation metrics may affect the effectiveness of this approach (Hewitt and Liang, 2019;Pimentel et al, 2020b;Maudslay et al, 2020;Pimentel et al, 2020a). Most work in intrinsic probing has focused in the identification of individual neurons that are important for a task (Li et al, 2016;Kádár et al, 2017;Li et al, 2017;Lakretz et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Intrinsic Probing through Dimension Selection

Hennigen¹,

Williams

Cotterell

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Self Cite

View full text Add to dashboard Cite

Most modern NLP systems make use of pretrained contextual representations that attain astonishingly high performance on a variety of tasks. Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it. In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted. To enable intrinsic probing, we propose a novel framework based on a decomposable multivariate Gaussian probe that allows us to determine whether the linguistic information in word embeddings is dispersed or focal. We then probe fastText and BERT for various morphosyntactic attributes across 36 languages. We find that most attributes are reliably encoded by only a few neurons, with fastText concentrating its linguistic structure more than BERT. 1

show abstract

Pareto Probing: Trading Off Accuracy for Complexity

Cited by 46 publications

References 32 publications

Low-Complexity Probing via Finding Subnetworks

Low-Complexity Probing via Finding Subnetworks

Does My Representation Capture X? Probe-Ably

Intrinsic Probing through Dimension Selection

Contact Info

Product

Resources

About