Proceedings of the Third Workshop on Computational Typology and Multilingual NLP 2021
DOI: 10.18653/v1/2021.sigtyp-1.7
|View full text |Cite
|
Sign up to set email alerts
|

Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Text-based Translation

Abstract: We translate a closed text that is known in advance into a severely low resource language by leveraging massive source parallelism. In other words, given a text in 124 source languages, we translate it into a severely low resource language using only ∼1,000 lines of low resource data without any external help. Firstly, we propose a systematic method to rank and choose source languages that are close to the low resource language. We call the linguistic definition of language family Family of Origin (FAMO), and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(8 citation statements)
references
References 59 publications
1
7
0
Order By: Relevance
“…We take the most centered translation for every sentence, max i j S ij , to build the combined translation output. The expectation of the combined score is higher than that of any of the source languages (Zhou and Waibel, 2021).…”
Section: Methodsmentioning
confidence: 72%
See 4 more Smart Citations
“…We take the most centered translation for every sentence, max i j S ij , to build the combined translation output. The expectation of the combined score is higher than that of any of the source languages (Zhou and Waibel, 2021).…”
Section: Methodsmentioning
confidence: 72%
“…We train our models using a state-of-the-art multilingual transformer by adding language labels to each source sentence (Johnson et al, 2017;Ha et al, 2016;Zhou et al, 2018a,b). We borrow the order-preserving named entity translation method by replacing each named entity with __NEs (Zhou et al, 2018b) using a multilingual lexicon table that covers 124 source languages and 2,939 named entities (Zhou and Waibel, 2021). For example, the sentence "Somchai calls Juan" is transformed to "__opt_src_en __opt_tgt_ca __NE0 calls __NE1" to translate to Chuj.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations