Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1042
|View full text |Cite
|
Sign up to set email alerts
|

Learning bilingual word embeddings with (almost) no bilingual data

Abstract: Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs. This has motivated an active research line to relax this requirement, with methods that use document-aligned corpora or bilingual dictionaries of a few thousand words instead. In this work, we further reduce the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique. Our method exploits the structur… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
533
2

Year Published

2018
2018
2019
2019

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 407 publications
(535 citation statements)
references
References 20 publications
0
533
2
Order By: Relevance
“…With these considerations in mind, one should wonder how stable the representation of names can be in an embedding space. This question has previously been raised by Artetxe et al (2017). We address it empirically below.…”
Section: Analysis Of Pos Compositionmentioning
confidence: 90%
See 1 more Smart Citation
“…With these considerations in mind, one should wonder how stable the representation of names can be in an embedding space. This question has previously been raised by Artetxe et al (2017). We address it empirically below.…”
Section: Analysis Of Pos Compositionmentioning
confidence: 90%
“…what is the ratio of True Positives to the sum of True Positives and False Positives. 1 Available at https://github.com/coastalcph/MUSE_dicos Data All systems listed above report results on one or both of two test sets: the MUSE test sets Conneau et al (2018) and/or the Dinu test sets (Dinu et al, 2015;Artetxe et al, 2017). Similarly to MUSE, the Dinu dataset was compiled automatically (from Europarl word-alignments), but it only covers four languages.…”
Section: Introductionmentioning
confidence: 99%
“…A common method of improving BLI is iteratively expanding the dictionary and refining the mapping matrix as a post-processing step (Artetxe et al, 2017;Lample et al, 2018). Given a learnt mapping matrix, Procrustes refinement first finds * W X denotes the set {W x :…”
Section: Iterative Procrustes Refinement and Hubness Mitigationmentioning
confidence: 99%
“…The bilingual dictionaries used in the word embedding alignment contained several thousands of word pairs, and the recent study by Artetxe et al (2017) shows that the dictionary size we operate with should be large enough to reach a high level of alignment accuracy.…”
Section: Cross-lingual Dependency Parsing Modelmentioning
confidence: 99%