Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1252
|View full text |Cite
|
Sign up to set email alerts
|

Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks

Abstract: We present Unicoder, a universal language encoder that is insensitive to different languages. Given an arbitrary NLP task, a model can be trained with Unicoder using training data in one language and directly applied to inputs of the same task in other languages. Comparing to similar efforts such as Multilingual BERT (Devlin et al., 2018) and XLM (Lample and Conneau, 2019), three new crosslingual pre-training tasks are proposed, including cross-lingual word recovery, crosslingual paraphrase classification and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
141
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 157 publications
(141 citation statements)
references
References 17 publications
0
141
0
Order By: Relevance
“…Current crosslingual models work by pre-training multilingual representations using some form of language modeling, which are then fine-tuned on the relevant task and transferred to different languages. Some authors leverage parallel data to that end (Conneau and Lample, 2019;Huang et al, 2019), but training a model akin to BERT (Devlin et al, 2019) on the combination of monolingual corpora in multiple languages is also effective (Conneau et al, 2020). Closely related to our work, Singh et al (2019) showed that replacing segments of the training data with their translation during fine-tuning is helpful.…”
Section: Related Workmentioning
confidence: 68%
“…Current crosslingual models work by pre-training multilingual representations using some form of language modeling, which are then fine-tuned on the relevant task and transferred to different languages. Some authors leverage parallel data to that end (Conneau and Lample, 2019;Huang et al, 2019), but training a model akin to BERT (Devlin et al, 2019) on the combination of monolingual corpora in multiple languages is also effective (Conneau et al, 2020). Closely related to our work, Singh et al (2019) showed that replacing segments of the training data with their translation during fine-tuning is helpful.…”
Section: Related Workmentioning
confidence: 68%
“…Cao et al (2020) improve the multilinguality of mBERT by introducing a regularization term in the objective, similar to the creation of static multilingual embedding spaces. Huang et al (2019) extend mBERT pretraining with three additional tasks and show an improved overall performance. More recently, better multilinguality is achieved by Pfeiffer et al (2020) (adapters) and Chi et al (2020) (parallel data).…”
Section: Related Workmentioning
confidence: 95%
“…They show that vocabulary overlap is not required for multilingual models, and suggest that abstractions shared across languages emerge automatically during pretraining. Another line of research investigate how to further improve these shared knowledge, such as applying post-hoc alignment (Wang et al, 2020b;Cao et al, 2020) and utilizing better calibrated training signal (Mulcaire et al, 2019;Huang et al, 2019). While prior work emphasize how to share to improve transferability, we study multilingual models from a different perspective of how to unshare to resolve language conflicts.…”
Section: Related Workmentioning
confidence: 99%