Data-driven Cross-lingual Syntax: An Agreement Study with Massively Multilingual Models

Varda, Andrea Gregor de; Marelli, Marco

doi:10.1162/coli_a_00472

Cited by 6 publications

(5 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results reported here seem to bode well for the crosslingual capacities of multilingual language models. They indicate shared representations of grammatical structure across languages (in line with Chi et al, 2020;Chang et al, 2022;de Varda and Marelli, 2023), and they show that these representations have a causal role in language generation.…”

Section: Implications For Multilingual Modelsmentioning

confidence: 70%

“…However, the structural priming paradigm has not been applied to modern multilingual language models. Previous work has demonstrated that multilingual language models encode grammatical features in shared subspaces across languages (Chi et al, 2020;Chang et al, 2022;de Varda and Marelli, 2023), largely relying on probing methods that do not establish causal effects on model predictions. Crosslingual structural priming would provide evidence that the abstract grammatical representations shared across languages in the models have causal effects on model-generated text.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

Michaelov,

Arnett,

Chang

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

grammatical knowledge-of parts of speech and grammatical patterns-is key to the capacity for linguistic generalization in humans. But how abstract is grammatical knowledge in large language models? In the human literature, compelling evidence for grammatical abstraction comes from structural priming. A sentence that shares the same grammatical structure as a preceding sentence is processed and produced more readily. Because confounds exist when using stimuli in a single language, evidence of abstraction is even more compelling from crosslingual structural priming, where use of a syntactic structure in one language primes an analogous structure in another language. We measure crosslingual structural priming in large language models, comparing model behavior to human experimental results from eight crosslingual experiments covering six languages, and four monolingual structural priming experiments in three non-English languages. We find evidence for abstract monolingual and crosslingual grammatical representations in the models that function similarly to those found in humans. These results demonstrate that grammatical representations in multilingual language models are not only similar across languages, but they can causally influence text produced in different languages.

show abstract

Section: Implications For Multilingual Modelsmentioning

confidence: 70%

Section: Introductionmentioning

confidence: 99%

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

Michaelov,

Arnett,

Chang

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Following the insight already presented in [15], the work's future development consists of exploring the possibilities of cross-lingual approaches [74].…”

Section: Discussionmentioning

confidence: 99%

Raising the Bar on Acceptability Judgments Classification: An Experiment on ItaCoLA Using ELECTRA

Guarasci,

Minutolo,

Buonaiuto

et al. 2024

Electronics

View full text Add to dashboard Cite

The task of automatically evaluating acceptability judgments has relished increasing success in Natural Language Processing, starting from including the Corpus of Linguistic Acceptability (CoLa) in the GLUE benchmark dataset. CoLa spawned a thread that led to the development of several similar datasets in different languages, broadening the investigation possibilities to many languages other than English. In this study, leveraging the Italian Corpus of Linguistic Acceptability (ItaCoLA), comprising nearly 10,000 sentences with acceptability judgments, we propose a new methodology that utilizes the neural language model ELECTRA. This approach exceeds the scores obtained from current baselines and demonstrates that it can overcome language-specific limitations in dealing with specific phenomena.

show abstract

“…where J is the length of the given target sentence y, and V is the vocabulary. P y j = k x, y <j ; θ T is calculated by Equation (9).…”

Section: Knowledge Extraction Paradigmmentioning

confidence: 99%

“…The experimental results largely explain that the layers of the model follow the so-called classic NLP pipeline principles, with lower levels specifically processing part-of-speech and other morphological information, middle layers responsible for more complex syntactic relationships, and higher levels dealing with higher-level linguistic phenomena such as anaphora and reference. In 2023, AG Varda et al [9] studied the inner workings of mBERT and XLM-R to test the performance of single neural units responding to precise grammatical phenomena (i.e., number agreement) in five languages (English, German, French, Hebrew, and Russian). Cross-language consistency.…”

Section: Introductionmentioning

confidence: 99%

A Mongolian-Chinese Neural Machine Translation Model Based on Soft Target Templates and Contextual Knowledge

Ren,

Pang,

Lang

2023

Applied Sciences

View full text Add to dashboard Cite

In recent years, Mongolian-Chinese neural machine translation (MCNMT) technology has made substantial progress. However, the establishment of the Mongolian dataset requires a significant amount of financial and material investment, which has become a major obstacle to the performance of MCNMT. Pre-training and fine-tuning technology have also achieved great success in the field of natural language processing, but how to fully exploit the potential of pre-training language models (PLMs) in MCNMT has become an urgent problem to be solved. Therefore, this paper proposes a novel MCNMT model based on the soft target template and contextual knowledge. Firstly, to learn the grammatical structure of target sentences, a selection-based parsing tree is adopted to generate candidate templates that are used as soft target templates. The template information is merged with the encoder-decoder framework, fully utilizing the templates and source text information to guide the translation process. Secondly, the translation model learns the contextual knowledge of sentences from the BERT pre-training model through the dynamic fusion mechanism and knowledge extraction paradigm, so as to improve the model’s utilization rate of language knowledge. Finally, the translation performance of the proposed model is further improved by integrating contextual knowledge and soft target templates by using a scaling factor. The effectiveness of the modified model is verified by a large number of data experiments, and the calculated BLEU (BiLingual Evaluation Understudy) value is increased by 4.032 points compared with the baseline MCNMT model of Transformers.

show abstract

Data-driven Cross-lingual Syntax: An Agreement Study with Massively Multilingual Models

Cited by 6 publications

References 49 publications

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

Raising the Bar on Acceptability Judgments Classification: An Experiment on ItaCoLA Using ELECTRA

A Mongolian-Chinese Neural Machine Translation Model Based on Soft Target Templates and Contextual Knowledge

Contact Info

Product

Resources

About