Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1449
|View full text |Cite
|
Sign up to set email alerts
|

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

Abstract: Recent efforts in cross-lingual word embedding (CLWE) learning have predominantly focused on fully unsupervised approaches that project monolingual embeddings into a shared cross-lingual space without any cross-lingual signal. The lack of any supervision makes such approaches conceptually attractive. Yet, their only core difference from (weakly) supervised projection-based CLWE methods is in the way they obtain a seed dictionary used to initialize an iterative self-learning procedure. The fully unsupervised me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
28
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
2

Relationship

2
8

Authors

Journals

citations
Cited by 52 publications
(29 citation statements)
references
References 44 publications
1
28
0
Order By: Relevance
“…Several recent studies (Patra et al, 2019;Ormazabal et al, 2019) criticize this simplified approach, showing that even the embedding spaces of closely related languages are not isometric. Vulić et al (2019) question the robustness of unsupervised mapping methods in challenging circumstances.…”
Section: Related Workmentioning
confidence: 99%
“…Several recent studies (Patra et al, 2019;Ormazabal et al, 2019) criticize this simplified approach, showing that even the embedding spaces of closely related languages are not isometric. Vulić et al (2019) question the robustness of unsupervised mapping methods in challenging circumstances.…”
Section: Related Workmentioning
confidence: 99%
“…As our research questions imply the availability of applicable lexica, unsupervised or weakly supervised approaches for inducing bilingual word embeddings [9,10,11] are only indirectly relevant to our work. However, we plan to compare against them in future work, especially given that claims of comparable or even superior performance of unsupervised methods (e.g., [12]) have been called into question [13,14], in particular when evaluated on actual downstream tasks instead of bilingual lexicon induction [15].…”
Section: Related Workmentioning
confidence: 99%
“…While massively multilingual models have obtained impressive quality improvements for low-resource languages as well as zero-shot scenarios (Aharoni et al, 2019;Arivazhagan et al, 2019a), it has not yet been shown how these massively multilingual models could be extended to unseen languages, beyond the pipelined approaches (Currey and Heafield, 2019;Lakew et al, 2019). On the other hand, self-supervised learning approaches have excelled at down-stream cross-lingual transfer (Devlin et al, 2019;Raffel et al, 2019;, but their success for unsupervised NMT (Conneau and Lample, 2019;Song et al, 2019) currently lacks robustness when languages are distant or monolingual data domains are mismatched (Neubig and Hu, 2018;Vulić et al, 2019). We observe that these two lines of research can be quite complementary and can compensate for each other's deficiencies.…”
Section: Related Workmentioning
confidence: 99%