2020
DOI: 10.1007/s10590-020-09258-6
|View full text |Cite
|
Sign up to set email alerts
|

Extremely low-resource neural machine translation for Asian languages

Abstract: This paper presents a set of effective approaches to handle extremely low-resource language pairs for self-attention based neural machine translation (NMT) focusing on English and four Asian languages. Starting from an initial set of parallel sentences used to train bilingual baseline models, we introduce additional monolingual corpora and data processing techniques to improve translation quality. We describe a series of best practices and empirically validate the methods through an evaluation conducted on eig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
2

Relationship

1
9

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 39 publications
0
14
0
Order By: Relevance
“…We used the fairseq framework [43] with the Transformer [60] architecture with 6 layer encoder (except for Filipino where 4 encoder layers were sufficient), 6 layer decoder and 1 attention head, decided through hyperparameter tuning as suggested by Rubino et al [48]. Dropout of 0.1 and label smoothing of 0.1 is used.…”
Section: Nmt Settingsmentioning
confidence: 99%
“…We used the fairseq framework [43] with the Transformer [60] architecture with 6 layer encoder (except for Filipino where 4 encoder layers were sufficient), 6 layer decoder and 1 attention head, decided through hyperparameter tuning as suggested by Rubino et al [48]. Dropout of 0.1 and label smoothing of 0.1 is used.…”
Section: Nmt Settingsmentioning
confidence: 99%
“…Considerable technical constraints exist in training algorithmic technologies to detect harmful content in a wide range of languages. Martinus and Abbott (2019) underlined some of the primary obstacles in machine translation for African languages such as limited availability and discoverability. The lack of language training also comes from the difficulty in accessing resources in uncommon languages.…”
Section: Inequalities Between Priority and Marginal Marketsmentioning
confidence: 99%
“…Monolingual data is often more freely available than bilingual data for low resource translation, as for small-domain translation. Data-centric approaches to low-resource language translation may therefore adapt to semi-synthetic datasets created by forward-or back-translation (Rubino et al, 2020;Karakanta et al, 2018). Even if this pseudo-parallel data is generated using a relatively weak low resource translation system, it still may be beneficial for further tuning that system (Currey & Heafield, 2019).…”
Section: Improving Low Resource Language Translation By Adaptationmentioning
confidence: 99%