Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1152
|View full text |Cite
|
Sign up to set email alerts
|

Learning Translations via Matrix Completion

Abstract: Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy.Our model achieves state-of-the-art performance for both high and low resource languages.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 32 publications
0
5
0
Order By: Relevance
“…Upadhyay et al (2016) obtain evaluation sets for the task across 26 languages from the Open Multilingual WordNet (Bond & Foster, 2013), while Levy et al (2017) obtain bilingual dictionaries from Wiktionary for Arabic, Finnish, Hebrew, Hungarian, and Turkish. More recently Wijaya, Callahan, Hewitt, Gao, Ling, Apidianaki, and Callison-Burch (2017) build evaluation data for 28 language pairs (where English is always the target language) by semi-automatically translating all Wikipedia words with frequency above 100. Most previous work (Vulić & Moens, 2013a;Mikolov et al, 2013b) filters source and target words based on part-of-speech, though this simplifies the task and introduces bias in the evaluation.…”
Section: Extrinsic Tasksmentioning
confidence: 99%
“…Upadhyay et al (2016) obtain evaluation sets for the task across 26 languages from the Open Multilingual WordNet (Bond & Foster, 2013), while Levy et al (2017) obtain bilingual dictionaries from Wiktionary for Arabic, Finnish, Hebrew, Hungarian, and Turkish. More recently Wijaya, Callahan, Hewitt, Gao, Ling, Apidianaki, and Callison-Burch (2017) build evaluation data for 28 language pairs (where English is always the target language) by semi-automatically translating all Wikipedia words with frequency above 100. Most previous work (Vulić & Moens, 2013a;Mikolov et al, 2013b) filters source and target words based on part-of-speech, though this simplifies the task and introduces bias in the evaluation.…”
Section: Extrinsic Tasksmentioning
confidence: 99%
“…To combat the issue of data starvation, many researchers aim to utilize monolingual data to train NMT systems (Lample et al, 2018a;Artetxe et al, 2018;Conneau and Lample, 2019) and find ways to generate more training data, either comparable or synthetic data. Comparable data are extracted using various bitext retrieval methods (Zhao and Vogel, 2002;Fan et al, 2021;Kocyigit et al, 2022), multimodal signals (Hewitt et al, 2018;Rasooli et al, 2021), dictionary-or knowledge-based approaches (Wijaya and Mitchell, 2016;Wijaya et al, 2017;Tang and Wijaya, 2022); while synthetic data are created and utilized either through innovative training data augmentation (Kuwanto et al, 2021), utilizing automatic backtranslation (Sennrich et al, 2016a;Wang et al, 2019), or even outright generating synthetic data using generative models (Lu et al, 2023), which has gained increasing attention by the community lately due to the advancement of large language models (LLMs).…”
Section: Augmenting Training For Nmtmentioning
confidence: 99%
“…So, a word in the target language is a translation candidate of a word in the source language if it tends to co-occur with the pairs of words from the seed words. A slightly different strategy is reported in Wijaya et al (2017), where the learning task is modeled as a matrix completion problem with source words in the columns and target words in the rows. More precisely, starting from some observed translations (e.g., from existing bilingual dictionaries), the method infers missing translations in the matrix using matrix factorization with a Bayesian Personalized Ranking.…”
Section: Cross-lingual Word Similarity From Monolingual Corporamentioning
confidence: 99%