2023
DOI: 10.1162/coli_a_00472
|View full text |Cite
|
Sign up to set email alerts
|

Data-driven Cross-lingual Syntax: An Agreement Study with Massively Multilingual Models

Abstract: Massively multilingual models such as mBERT and XLM-R are increasingly valued in Natural Language Processing research and applications, due to their ability to tackle the uneven distribution of resources available for different languages. The models’ ability to process multiple languages relying on a shared set of parameters raises the question of whether the grammatical knowledge they extracted during pre-training can be considered as a data-driven cross-lingual grammar. The present work studies the inner wor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 49 publications
0
5
0
Order By: Relevance
“…The results reported here seem to bode well for the crosslingual capacities of multilingual language models. They indicate shared representations of grammatical structure across languages (in line with Chi et al, 2020;Chang et al, 2022;de Varda and Marelli, 2023), and they show that these representations have a causal role in language generation.…”
Section: Implications For Multilingual Modelsmentioning
confidence: 70%
See 1 more Smart Citation
“…The results reported here seem to bode well for the crosslingual capacities of multilingual language models. They indicate shared representations of grammatical structure across languages (in line with Chi et al, 2020;Chang et al, 2022;de Varda and Marelli, 2023), and they show that these representations have a causal role in language generation.…”
Section: Implications For Multilingual Modelsmentioning
confidence: 70%
“…However, the structural priming paradigm has not been applied to modern multilingual language models. Previous work has demonstrated that multilingual language models encode grammatical features in shared subspaces across languages (Chi et al, 2020;Chang et al, 2022;de Varda and Marelli, 2023), largely relying on probing methods that do not establish causal effects on model predictions. Crosslingual structural priming would provide evidence that the abstract grammatical representations shared across languages in the models have causal effects on model-generated text.…”
Section: Introductionmentioning
confidence: 99%
“…Following the insight already presented in [15], the work's future development consists of exploring the possibilities of cross-lingual approaches [74].…”
Section: Discussionmentioning
confidence: 99%
“…where J is the length of the given target sentence y, and V is the vocabulary. P y j = k x, y <j ; θ T is calculated by Equation (9).…”
Section: Knowledge Extraction Paradigmmentioning
confidence: 99%
“…The experimental results largely explain that the layers of the model follow the so-called classic NLP pipeline principles, with lower levels specifically processing part-of-speech and other morphological information, middle layers responsible for more complex syntactic relationships, and higher levels dealing with higher-level linguistic phenomena such as anaphora and reference. In 2023, AG Varda et al [9] studied the inner workings of mBERT and XLM-R to test the performance of single neural units responding to precise grammatical phenomena (i.e., number agreement) in five languages (English, German, French, Hebrew, and Russian). Cross-language consistency.…”
Section: Introductionmentioning
confidence: 99%