Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1307
|View full text |Cite
|
Sign up to set email alerts
|

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Abstract: Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For nonisomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
24
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 33 publications
(25 citation statements)
references
References 29 publications
(48 reference statements)
1
24
0
Order By: Relevance
“…Unsupervised approaches even exploit the assumption twice as their seed extraction is fully based on the topological similarity. Future work should move beyond the restrictive assumption by exploring new methods that can, e.g., 1) increase the isomorphism between monolingual spaces (Zhang et al, 2019) by distinguishing between language-specific and language-pairinvariant subspaces; 2) learn effective non-linear or multiple local projections between monolingual spaces similar to the preliminary work of Nakashole (2018); 3) similar to Vulić and Korhonen (2016) and Lubin et al ( 2019) "denoisify" seed lexicons during the self-learning procedure. For instance, keeping only mutual/symmetric nearest neighbour as in FULL+SL+SYM can be seen as a form of rudimentary denoisifying: it is indicative to see that the best overall performance in this work is reported with that model configuration.…”
Section: Further Discussion and Conclusionmentioning
confidence: 99%
“…Unsupervised approaches even exploit the assumption twice as their seed extraction is fully based on the topological similarity. Future work should move beyond the restrictive assumption by exploring new methods that can, e.g., 1) increase the isomorphism between monolingual spaces (Zhang et al, 2019) by distinguishing between language-specific and language-pairinvariant subspaces; 2) learn effective non-linear or multiple local projections between monolingual spaces similar to the preliminary work of Nakashole (2018); 3) similar to Vulić and Korhonen (2016) and Lubin et al ( 2019) "denoisify" seed lexicons during the self-learning procedure. For instance, keeping only mutual/symmetric nearest neighbour as in FULL+SL+SYM can be seen as a form of rudimentary denoisifying: it is indicative to see that the best overall performance in this work is reported with that model configuration.…”
Section: Further Discussion and Conclusionmentioning
confidence: 99%
“…Relational similarity As an alternative, we consider a simpler measure inspired by Zhang et al (2019). This measure, dubbed RSIM, is based on the intuition that the similarity distributions of translations within each language should be similar.…”
Section: Quantifying Isomorphismmentioning
confidence: 99%
“…Corpus size It has become standard to align monolingual word embeddings trained on Wikipedia Zhang et al, 2019). As can be seen in Figure 1, and also in Table 1, Wikipedias of low-resource languages are more than a magnitude smaller than Wikipedias of high-resource languages.…”
Section: Isomorphism and Learningmentioning
confidence: 99%
See 2 more Smart Citations