2022
DOI: 10.48550/arxiv.2205.11758
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Abstract: The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and crosslingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 21 publications
1
1
0
Order By: Relevance
“…Thus we conclude that on average the models gradually start to perform more cross-lingual sharing as fine-tuning progresses. Moreover, in line with previous findings (Blevins et al, 2022), we observe that the amount of cross-lingual sharing between different language-pairs fluctuates during fine-tuning (see Appendix C for results). To test whether the ranked influence scores between epochs are statistically significantly different, we apply the Wilcoxon signed-rank test (Wilcoxon, 1992), and confirm that between all epochs this holds true (p-value < 0.05).…”
Section: Resultssupporting
confidence: 92%
See 1 more Smart Citation
“…Thus we conclude that on average the models gradually start to perform more cross-lingual sharing as fine-tuning progresses. Moreover, in line with previous findings (Blevins et al, 2022), we observe that the amount of cross-lingual sharing between different language-pairs fluctuates during fine-tuning (see Appendix C for results). To test whether the ranked influence scores between epochs are statistically significantly different, we apply the Wilcoxon signed-rank test (Wilcoxon, 1992), and confirm that between all epochs this holds true (p-value < 0.05).…”
Section: Resultssupporting
confidence: 92%
“…While in previous results we reported the sum of these scores, we now analyze them separately per fine-tuning epoch. Blevins et al (2022) study cross-lingual pretraining dynamics of multilingual models to see when crosslingual sharing emerges. We instead study whether different patterns emerge when looking at language influence across fine-tuning.…”
Section: Sharing Dynamics During Fine-tuningmentioning
confidence: 99%