Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1138
|View full text |Cite
|
Sign up to set email alerts
|

Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER

Abstract: Contextual word embeddings (e.g. GPT, BERT, ELMo, etc.) have demonstrated stateof-the-art performance on various NLP tasks. Recent work with the multilingual version of BERT has shown that the model performs very well in cross-lingual settings, even when only labeled English data is used to finetune the model. We improve upon multilingual BERT's zero-resource cross-lingual performance via adversarial learning. We report the magnitude of the improvement on the multilingual ML-Doc text classification and CoNLL… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
91
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 58 publications
(91 citation statements)
references
References 18 publications
0
91
0
Order By: Relevance
“…Artetxe et al [1] pretrain a massively multilingual sequence-to-sequence neural MT model, invoking its encoder as a multilingual text representation used for fine-tuning on downstream tasks. Keung et al [10] apply languageadversarial learning into Multilingual BERT during fine-tuning with unlabeled data. We also considered Multilingual BERT and itself with self-learning and adversarial training respectively as our baselines.…”
Section: Results and Analysismentioning
confidence: 99%
“…Artetxe et al [1] pretrain a massively multilingual sequence-to-sequence neural MT model, invoking its encoder as a multilingual text representation used for fine-tuning on downstream tasks. Keung et al [10] apply languageadversarial learning into Multilingual BERT during fine-tuning with unlabeled data. We also considered Multilingual BERT and itself with self-learning and adversarial training respectively as our baselines.…”
Section: Results and Analysismentioning
confidence: 99%
“…They leverage the benefit of contextualized word embeddings by using multilingual BERT (Devlin et al, 2019) as the feature generator, and adopt the GAN framework (Goodfellow et al, 2014) to align the features from the two domains. Keung et al (2019) show significant improvement over the baseline where the pretrained multilingual BERT is finetuned on the English data alone and testing on the same tasks in other languages. However, Keung et al (2019), as well as the works mentioned above, are inspired by the pioneering work of Ben-David et al (2010), which only rigorously studies domain adaptation in the setting of binary classification; there is a lack of theoretical guarantees when it comes to multiclass classification.…”
Section: Introductionmentioning
confidence: 89%
“…Unsupervised domain adaptation provides an appealing solution to many applications where direct access to a massive amount of labeled data is prohibitive or very costly (Sun and Saenko, 2014;Vazquez et al, 2013;Stark et al, 2010;Keung et al, 2019). For example, we often have sufficient labeled data for English, while very limited or even no labeled data are available for many other languages.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations