Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.191
|View full text |Cite
|
Sign up to set email alerts
|

On the evolution of syntactic information encoded by BERT’s contextualized representations

Abstract: The adaptation of pretrained language models to solve supervised tasks has become a baseline in NLP, and many recent works have focused on studying how linguistic information is encoded in the pretrained sentence representations. Among other information, it has been shown that entire syntax trees are implicitly embedded in the geometry of such models. As these models are often fine-tuned, it becomes increasingly important to understand how the encoded knowledge evolves along the fine-tuning. In this paper, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…A combination of techniques may also be used to bring downstream tasks back into the picture. In a recent study, Pérez-Mayos et al ( 2021 ) uses structural probing, not to assess whether a single static model has learned syntax or not, but to track how syntactic capabilities evolve as a pre-trained model is fine-tuned for different tasks. One could imagine a similar experimental design using TSE instead of (or together with) probing.…”
Section: Discussionmentioning
confidence: 99%
“…A combination of techniques may also be used to bring downstream tasks back into the picture. In a recent study, Pérez-Mayos et al ( 2021 ) uses structural probing, not to assess whether a single static model has learned syntax or not, but to track how syntactic capabilities evolve as a pre-trained model is fine-tuned for different tasks. One could imagine a similar experimental design using TSE instead of (or together with) probing.…”
Section: Discussionmentioning
confidence: 99%
“…BERT and related models have been investigated extensively with different strategies over the last years, finding that BERT learns syntactic (Hewitt and Manning, 2019;Tenney et al, 2019) and semantic representations (Ettinger, 2020) that are generally preserved through fine-tuning for semantic tasks like paraphrasing (Pérez-Mayos et al, 2021). However, Hessel and Schofield (2021) find that on the GLUE tasks (Wang et al, 2018), BERT is relatively insensitive to shuffling of the input sentences, which removes many syntactic clues in English.…”
Section: Related Workmentioning
confidence: 99%
“…One body of work approaches the problem by applying heuristic rules of perturbation to input sequences (Wallace et al, 2019;Jia and Liang, 2017;, while another uses neural models to construct adversarial examples (Li et al, 2020(Li et al, , 2018 or manipulate inputs in embedding space (Jin et al, 2020). Our work also contributes to efforts to understand impacts and outcomes of the fine-tuning process (Miaschi et al, 2020;Mosbach et al, 2020;Wang et al, 2020;Perez-Mayos et al, 2021).…”
Section: Related Workmentioning
confidence: 99%