Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-main.20
|View full text |Cite
|
Sign up to set email alerts
|

Multilingual BERT Post-Pretraining Alignment

Abstract: We propose a simple method to align multilingual contextual embeddings as a postpretraining step for improved cross-lingual transferability of the pretrained language models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform sentence-level code-switching with English when finetuning on downstream tasks. On XNLI, our best mo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(17 citation statements)
references
References 19 publications
0
17
0
Order By: Relevance
“…Although we can have different loss functions to optimise XLM-K, we choose contrastive learning due to its promising results in both visual representations (He et al 2020;Chen et al 2020) and cross-lingual pre-training (Chi et al 2021;Pan et al 2021). Intuitively, by distinguishing the positive sample from the negative samples using the contrastive loss, the model stores expressive knowledge acquired from the structure data.…”
Section: Joint Pre-training Objectivementioning
confidence: 99%
See 1 more Smart Citation
“…Although we can have different loss functions to optimise XLM-K, we choose contrastive learning due to its promising results in both visual representations (He et al 2020;Chen et al 2020) and cross-lingual pre-training (Chi et al 2021;Pan et al 2021). Intuitively, by distinguishing the positive sample from the negative samples using the contrastive loss, the model stores expressive knowledge acquired from the structure data.…”
Section: Joint Pre-training Objectivementioning
confidence: 99%
“…The results on MLQA are shown in Table 1, we compare our model with mBERT (Lewis et al 2020), XLM (Lewis et al 2020), mBERT + PPA (Pan et al 2021), Unicoder (Huang et al 2019) and XLM-R base (Conneau et al 2020). Since F1 and EM scores have similar observations, we take F1 scores for analysis:…”
Section: Downstream Task Evaluationmentioning
confidence: 99%
“…InfoXLM (Chi et al 2020b) proposed a pre-training task based on contrastive learning, from an information-theoretic perspective. Pan et al (2020) also introduced an alignment method based on contrastive learning. Cao, Kitaev, and Klein (2020) proposed an explicit word-level alignment procedure.…”
Section: Related Workmentioning
confidence: 99%
“…Retrieval P@1 With the word-level contrastive objective, we observed significant BLEU score improvements on language pairs such as en-ro, en-et and en-my for mBART FT as presented in Table 2. However, noisy word pairs (Pan et al, 2021a) extracted via word alignment toolkits leads to poor supervision signals for improving sentence retrieval P@1, which in turn prevents some language pairs such as en-kk from exhibiting BLEU improvements. We found that for en-kk, the numbers of extracted word pairs per sentence by word2word and FastAlign are 1.0 and 2.2, respectively.…”
Section: Word-level Contrastive Objective and Sentencementioning
confidence: 99%