Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.257
|View full text |Cite
|
Sign up to set email alerts
|

Are All Good Word Vector Spaces Isomorphic?

Abstract: Existing algorithms for aligning cross-lingual word vector spaces assume that vector spaces are approximately isomorphic. As a result, they perform poorly or fail completely on nonisomorphic spaces. Such non-isomorphism has been hypothesised to result from typological differences between languages. In this work, we ask whether non-isomorphism is also crucially a sign of degenerate word vector spaces. We present a series of experiments across diverse languages which show that variance in performance across lang… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
2

Relationship

2
8

Authors

Journals

citations
Cited by 29 publications
(26 citation statements)
references
References 36 publications
0
26
0
Order By: Relevance
“…The differences in embedding spaces of different languages do not only depend on linguistic properties of the languages in consideration, but also on other factors such as the chosen training algorithm, underlying training domain, or training data size and quality Arora et al, 2019;Vulić et al, 2020). In future research we also plan an in-depth study of these factors and their relation to our spectral analysis.…”
Section: Further Discussion and Conclusionmentioning
confidence: 97%
“…The differences in embedding spaces of different languages do not only depend on linguistic properties of the languages in consideration, but also on other factors such as the chosen training algorithm, underlying training domain, or training data size and quality Arora et al, 2019;Vulić et al, 2020). In future research we also plan an in-depth study of these factors and their relation to our spectral analysis.…”
Section: Further Discussion and Conclusionmentioning
confidence: 97%
“…Our method can be adjusted for multitask and multilingual settings. Following the observation that the orthogonal transformation can map distributions of embeddings in typologically close languages (Mikolov et al, 2013;Vulić et al, 2020). We think that joint training for many languages may be possible by keeping the same Scaling Vector and adding a separate Orthogonal Transformation per language, fulfilling the role of orthogonal mappings.…”
Section: Further Workmentioning
confidence: 93%
“…An empirical measure of semantic proximity between two languages is often computed using the degree of isomorphism, that is, how similar the structures of two languages are in topological space (Søgaard et al, 2018). Research in cross-lingual transfer tasks shows that linguistic differences across languages often make spaces depart from isomorphism (Nakashole and Flauger, 2018;Søgaard et al, 2018;Patra et al, 2019;Vulić et al, 2020). While this degrades the quality of bilingual embeddings, it is a desired characteristic in our case: since our task involves processing of (multi-view) representations of monolingual text, departures from isomorphism indicate diversity in the source that generates them.…”
Section: Measuring Isomorphismmentioning
confidence: 99%