Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Zhang, Mozhi; Xu, Keyulu; Kawarabayashi, Ken-ichi; Jegelka, Stefanie; Boyd-Graber, Jordan

doi:10.48550/arxiv.1906.01622

Cited by 7 publications

(4 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For mapping across different embedding spaces, we use vecmap toolkit 1 . We follow Zhang et al (2019) to pre-process the embeddings, i.e., the embeddings are unit-normed, mean-centered and unitnormed again. For bilingual induction, we follow the steps outlined by (Artetxe et al, 2018a), i.e., whitening each space, and solving Procrustes.…”

Section: Vecmap Toolkitmentioning

confidence: 99%

GRI: Graph-based Relative Isomorphism of Word Embedding Spaces

Ali,

Hu,

Qin

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Automated construction of bilingual dictionaries using monolingual embedding spaces is a core challenge in machine translation. The end performance of these dictionaries relies upon the geometric similarity of individual spaces, i.e., their degree of isomorphism. Existing attempts aimed at controlling the relative isomorphism of different spaces fail to incorporate the impact of semantically related words in the training objective. To address this, we propose GRI that combines the distributional training objectives with attentive graph convolutions to unanimously consider the impact of semantically similar words required to define/compute the relative isomorphism of multiple spaces. Experimental evaluation shows that GRI outperforms the existing research by improving the average P@1 by a relative score of up to 63.6%. We release the codes for GRI at https://github.com/asif6827/GRI.

show abstract

Section: Vecmap Toolkitmentioning

confidence: 99%

GRI: Graph-based Relative Isomorphism of Word Embedding Spaces

Ali,

Hu,

Qin

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

show abstract

“…We use VecMap toolkit 1 for mapping across different embedding spaces. For this, we pre-process the embeddings using a process flow outlined by Zhang et al (2019). The embeddings are unitnormed, mean-centered followed by another round of unit-normalization.…”

Section: Vecmap Toolkitmentioning

confidence: 99%

GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings

Ali,

Alshmrani,

Qin

et al. 2023

Proceedings of ArabicNLP 2023

View full text Add to dashboard Cite

Bilingual Lexical Induction (BLI) is a core challenge in NLP, it relies on the relative isomorphism of individual embedding spaces. Existing attempts aimed at controlling the relative isomorphism of different embedding spaces fail to incorporate the impact of semantically related words in the model training objective. To address this, we propose GARI that combines the distributional training objectives with multiple isomorphism losses guided by the graph attention network. GARI considers the impact of semantical variations of words in order to define the relative isomorphism of the embedding spaces. Experimental evaluation using the Arabic language data set shows that GARI outperforms the existing research by improving the average P@1 by a relative score of up to 40.95% and 76.80% for in-domain and domain mismatch settings respectively. We release the codes for GARI at https: //github.com/asif6827/GARI.

show abstract

“…Tagowski et al [17] applies the embedding alignment technique to the graph domain, where they align a set of node2vec embeddings [4] learned over different snapshots of an evolving graph. However, all these methods assume the embeddings are fixed, which could result in a large alignment error if two sets of pretrained embeddings are very distinct [23]. Unlike these methods, we jointly learn the embeddings along with the backward transformation function, achieving much better alignment performance and better unintended task performance.…”

Section: Related Workmentioning

confidence: 99%

Learning Backward Compatible Embeddings

Hu,

Bansal,

Cao

et al. 2022

Preprint

View full text Add to dashboard Cite

Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However, as the embedding model gets updated and retrained to improve performance on the intended task, the newly-generated embeddings are no longer compatible with the existing consumer models. This means that historical versions of the embeddings can never be retired or all consumer teams have to retrain their models to make them compatible with the latest version of the embeddings, both of which are extremely costly in practice.Here we study the problem of embedding version updates and their backward compatibility. We formalize the problem where the goal is for the embedding team to keep updating the embedding version, while the consumer teams do not have to retrain their models. We develop a solution based on learning backward compatible embeddings, which allows the embedding model version to be updated frequently, while also allowing the latest version of the embedding to be quickly transformed into any backward compatible historical version of it, so that consumer teams do not have to retrain their models. Our key idea is that whenever a new embedding model is trained, we learn it together with a light-weight backward compatibility transformation that aligns the new embedding to the previous version of it. Our learned backward transformations can then be composed to produce any historical version of embedding. Under our framework, we explore six methods and systematically evaluate them on a real-world recommender system application. We show that the best method, which we call BC-Aligner, maintains backward compatibility with existing unintended tasks even after multiple model version updates. Simultaneously, BC-Aligner achieves the intended task performance similar to the embedding model that is solely optimized for the intended task. 1

show abstract

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Cited by 7 publications

References 24 publications

GRI: Graph-based Relative Isomorphism of Word Embedding Spaces

GRI: Graph-based Relative Isomorphism of Word Embedding Spaces

GARI: Graph Attention for Relative Isomorphism of Arabic Word Embeddings

Learning Backward Compatible Embeddings

Contact Info

Product

Resources

About