Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

Huang, Haoyang; Liang, Yilong; Duan, Nan; Gong, Ming; Shou, Linjun; Jiang, Daxin; Zhou, Ming

doi:10.18653/v1/d19-1252

Cited by 157 publications

(141 citation statements)

References 17 publications

Supporting

Mentioning

141

Contrasting

Order By: Relevance

“…Current crosslingual models work by pre-training multilingual representations using some form of language modeling, which are then fine-tuned on the relevant task and transferred to different languages. Some authors leverage parallel data to that end (Conneau and Lample, 2019;Huang et al, 2019), but training a model akin to BERT (Devlin et al, 2019) on the combination of monolingual corpora in multiple languages is also effective (Conneau et al, 2020). Closely related to our work, Singh et al (2019) showed that replacing segments of the training data with their translation during fine-tuning is helpful.…”

Section: Related Workmentioning

confidence: 68%

Translation Artifacts in Cross-lingual Transfer Learning

Artetxe¹,

Labaka²,

Agirre³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Both human and machine translation play a central role in cross-lingual transfer learning: many multilingual datasets have been created through professional translation services, and using machine translation to translate either the test set or the training set is a widely used transfer technique. In this paper, we show that such translation process can introduce subtle artifacts that have a notable impact in existing cross-lingual models. For instance, in natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them, which current models are highly sensitive to. We show that some previous findings in cross-lingual transfer learning need to be reconsidered in the light of this phenomenon. Based on the gained insights, we also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.

show abstract

Section: Related Workmentioning

confidence: 68%

Translation Artifacts in Cross-lingual Transfer Learning

Artetxe¹,

Labaka²,

Agirre³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Cao et al (2020) improve the multilinguality of mBERT by introducing a regularization term in the objective, similar to the creation of static multilingual embedding spaces. Huang et al (2019) extend mBERT pretraining with three additional tasks and show an improved overall performance. More recently, better multilinguality is achieved by Pfeiffer et al (2020) (adapters) and Chi et al (2020) (parallel data).…”

Section: Related Workmentioning

confidence: 95%

Identifying Elements Essential for BERT’s Multilinguality

Dufter¹,

Schütze²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is surprising given that mBERT does not use any crosslingual signal during training. While recent literature has studied this phenomenon, the reasons for the multilinguality are still somewhat obscure. We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual. To allow for fast experimentation we propose an efficient setup with small BERT models trained on a mix of synthetic and natural data. Overall, we identify four architectural and two linguistic elements that influence multilinguality. Based on our insights, we experiment with a multilingual pretraining setup that modifies the masking strategy using VecMap, i.e., unsupervised embedding alignment. Experiments on XNLI with three languages indicate that our findings transfer from our small setup to larger scale settings.

show abstract

“…They show that vocabulary overlap is not required for multilingual models, and suggest that abstractions shared across languages emerge automatically during pretraining. Another line of research investigate how to further improve these shared knowledge, such as applying post-hoc alignment (Wang et al, 2020b;Cao et al, 2020) and utilizing better calibrated training signal (Mulcaire et al, 2019;Huang et al, 2019). While prior work emphasize how to share to improve transferability, we study multilingual models from a different perspective of how to unshare to resolve language conflicts.…”

Section: Related Workmentioning

confidence: 99%

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Wang¹,

Tsvetkov²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Modern multilingual models are trained on concatenated text from multiple languages in hopes of conferring benefits to each (positive transfer), with the most pronounced benefits accruing to low-resource languages. However, recent work has shown that this approach can degrade performance on high-resource languages, a phenomenon known as negative interference. In this paper, we present the first systematic study of negative interference. We show that, contrary to previous belief, negative interference also impacts low-resource languages. While parameters are maximally shared to learn language-universal structures, we demonstrate that language-specific parameters do exist in multilingual models and they are a potential cause of negative interference. Motivated by these observations, we also present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference, by adding languagespecific layers as meta-parameters and training them in a manner that explicitly improves shared layers' generalization on all languages. Overall, our results show that negative interference is more common than previously known, suggesting new directions for improving multilingual representations. 1

show abstract

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

Cited by 157 publications

References 17 publications

Translation Artifacts in Cross-lingual Transfer Learning

Translation Artifacts in Cross-lingual Transfer Learning

Identifying Elements Essential for BERT’s Multilinguality

On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment

Contact Info

Product

Resources

About