Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.187
|View full text |Cite
|
Sign up to set email alerts
|

Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations

Abstract: Sparse language vectors from linguistic typology databases and learned embeddings from tasks like multilingual machine translation have been investigated in isolation, without analysing how they could benefit from each other's language characterisation. We propose to fuse both views using singular vector canonical correlation analysis and study what kind of information is induced from each source. By inferring typological features and language phylogenies, we observe that our representations embed typology and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 27 publications
(32 citation statements)
references
References 39 publications
(57 reference statements)
1
31
0
Order By: Relevance
“…To construct linguistic distances (Hajič, 2000;Oncevay et al, 2020), some explore typological distance (Chowdhury et al, 2020;Rama and Kolachina, 2012;Pienemann et al, 2005;Svalberg and Chuchu, 1998;Hansen et al, 2012;Comrie, 2005), lexical distance (Huang et al, 2007), Levenshtein distance and Jaccard distance (Serva and Petroni, 2008;Holman et al, 2008;Adebara et al, 2020), sonority distance (Parker, 2012) and spectral distance (Dubossarsky et al, 2020).…”
Section: Linguistic Distancementioning
confidence: 99%
“…To construct linguistic distances (Hajič, 2000;Oncevay et al, 2020), some explore typological distance (Chowdhury et al, 2020;Rama and Kolachina, 2012;Pienemann et al, 2005;Svalberg and Chuchu, 1998;Hansen et al, 2012;Comrie, 2005), lexical distance (Huang et al, 2007), Levenshtein distance and Jaccard distance (Serva and Petroni, 2008;Holman et al, 2008;Adebara et al, 2020), sonority distance (Parker, 2012) and spectral distance (Dubossarsky et al, 2020).…”
Section: Linguistic Distancementioning
confidence: 99%
“…SP08 was constructed by computing the Levenshtein (edit) distance between words from an open cross-lingual list (Dyen et al, 1992) to compare linguistic divergence through time and thus partially encodes lexical similarity of languages (Oncevay et al, 2020). Rabinovich et al (2017) also acknowledges that SP08 has been disputed and researchers have not yet agreed on a commonly accepted tree of the Indo-European languages (Ringe et al, 2002).…”
Section: Phylogenetics and Shining-throughmentioning
confidence: 99%
“…Many of these approaches use language embeddings with sparse features encoding WALS feature values. Oncevay et al (2020) find that combining information from typological databases with embeddings learned during training of an NMT model can be beneficial for multilingual NMT.…”
Section: Typologically Informed Sharingmentioning
confidence: 96%