Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Kvapilíková, Ivana; Artetxe, Mikel; Labaka, Gorka; Agirre, Eneko; Bojar, Ondřej

doi:10.18653/v1/2020.acl-srw.34

Cited by 28 publications

(45 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…En-Es Zh-Kz (Hangya and Fraser, 2019) 18.5 21.6 (Keung et al, 2020) 20.2 22.8 (Hangya et al, 2018) 16.3 19.3 (Kvapilíková et al, 2020) 23.6 22.7 Proposed method 24.3 25.8 We use openNMT 4 to train the machine translation system. The results are as in Table 3.…”

Section: Methodsmentioning

confidence: 99%

“…This transfer learning method inspired our work and the main difference is that they required bilingual supervision (e.g, bilingual lexicon, parallel sentences), which is not available for many low-resource language pairs. Recently, several works developed unsupervised method to mine parallel data (Hangya et al, 2018;Hangya and Fraser, 2019;Kvapilíková et al, 2020;Keung et al, 2020). These approaches mainly rely on unsupervised cross-lingual embeddings (Artetxe et al, 2018;Lample and Conneau, 2019) that be trained on monolingual corpora.…”

Section: Related Workmentioning

confidence: 99%

“…author: Shaolin Zhu, zhushaolin003@163.com supervision (e.g, bilingual lexicon or sentences), which is not available for low-resource language pairs. Although (Kvapilíková et al, 2020) solved the supervised limitation by employing an unsupervised MT, the performance heavily depended on MT's quality.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2021

View full text Add to dashboard Cite

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

2021

View full text Add to dashboard Cite

show abstract

“…Baselines: In our experiments, we consider supervised baselines (Bouamor and Sajjad, 2018;Schwenk, 2018;Artetxe and Schwenk, 2019). We also compare several unsupervised baselines which contains (Hangya and Fraser, 2019;Keung et al, 2020;Hangya et al, 2018;Kvapilíková et al, 2020).…”

Section: En-frmentioning

confidence: 99%

“…However, their method is not unsupervised and relies on bilingual supervision (e.g, bilingual lexicon or sentences), which is not available for low-resource language pairs. Although (Kvapilíková et al, 2020) solved the supervised limitation by employing an unsupervised MT, the performance heavily depended on MT's quality.…”

Section: Introductionmentioning

confidence: 99%

Parallel sentences mining with transfer learning in an unsupervised setting

Sun¹,

Zhu²,

Feng³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Rese

View full text Add to dashboard Cite

The quality and quantity of parallel sentences are known as very important training data for constructing neural machine translation (N-MT) systems. However, these resources are not available for many low-resource language pairs. Many existing methods need strong supervision and hence are not suitable. Although there have been several attempts at developing unsupervised models, they ignore the language-invariant between languages. In this paper, we propose an approach based on transfer learning to mine parallel sentences in an unsupervised setting. With the help of bilingual corpora of rich-resource language pairs, we can mine parallel sentences without bilingual supervision of low-resource language pairs. Experiments show that our approach improves the performance of mined parallel sentences compared with previous methods. In particular, we achieve good results at two real-world low-resource language pairs.

show abstract

An Explainable Evaluation of Unsupervised Transfer Learning for Parallel Sentences Mining

Zhu

Shi

2021

Web and Big Data

View full text Add to dashboard Cite

Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining

Cited by 28 publications

References 24 publications

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Parallel sentences mining with transfer learning in an unsupervised setting

An Explainable Evaluation of Unsupervised Transfer Learning for Parallel Sentences Mining

Contact Info

Product

Resources

About