Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1) 2019
DOI: 10.18653/v1/w19-5311
|View full text |Cite
|
Sign up to set email alerts
|

The TALP-UPC Machine Translation Systems for WMT19 News Translation Task: Pivoting Techniques for Low Resource MT

Abstract: In this article, we describe the TALP-UPC research group participation in the WMT19 news translation shared task for Kazakh-English. Given the low amount of parallel training data, we resort to using Russian as pivot language, training subword-based statistical translation systems for Russian-Kazakh and Russian-English that were then used to create two synthetic pseudo-parallel corpora for Kazakh-English and English-Kazakh respectively. Finally, a self-attention model based on the decoder part of the Transform… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 19 publications
1
7
0
Order By: Relevance
“…TALP_UPC_2019_ENKK (Casas et al, 2019) The TALP-UPC system was trained on a combination of the original Kazakh-English data (oversampled 3x) together with synthetic corpora obtained by translating with a BPE-based Moses the Russian side of the Kazakh-Russian data to English for the en-kk direction, and the Russian side of the English-Russian data to Kazakh for the kken direction. For the final systems, a custom model consisting in a self-attention Transformer decoder that learns joint source-target representations (with BPE tokenization) was used, implemented on the fairseq library.…”
Section: Talp_upc_2019_kken Andmentioning
confidence: 99%
“…TALP_UPC_2019_ENKK (Casas et al, 2019) The TALP-UPC system was trained on a combination of the original Kazakh-English data (oversampled 3x) together with synthetic corpora obtained by translating with a BPE-based Moses the Russian side of the Kazakh-Russian data to English for the en-kk direction, and the Russian side of the English-Russian data to Kazakh for the kken direction. For the final systems, a custom model consisting in a self-attention Transformer decoder that learns joint source-target representations (with BPE tokenization) was used, implemented on the fairseq library.…”
Section: Talp_upc_2019_kken Andmentioning
confidence: 99%
“…Our approach allows for rapid adaptation to new languages with distinct scripts with only a minor degradation in performance on the original language pairs. Kocmi and Bojar (2019) 8.7 18.5 Li et al (2019) 11.1 30.5 Casas et al (2019) 15.5 21.0 Dabre et al (2019) 6.4 26.4 Briakou and Carpuat (2019) -9.94 Littell et al (2019) -25.0 mBART (Liu et al, 2020) 2. Monolingual data for a single language In this case, we compute the probabilities following the temperature-based sampling scheme that we would have obtained had we computed with this data in the first place.…”
Section: Discussionmentioning
confidence: 99%
“…Koehn and Knowles [16], Östling and Tiedemann [17] and Dowling et al [18] found that PB-SMT can provide better translations than NMT in lowresource scenarios. In contrast to these findings, however, many studies have demonstrated that NMT is better than PB-SMT in low-resource situations [8,19]. This work investigated translations of a software localisation text with two low-resource translation pairs, Hindi-To-Tamil and English-To-Tamil, taking two MT paradigms, PB-SMT and NMT, into account.…”
Section: Related Workmentioning
confidence: 96%
“…In this context, we refer interested readers to some of the papers [14,15] that compared phrase-based statistical machine translation (PB-SMT) and NMT on a variety of use-cases. As for low-resource scenarios, as mentioned above, many studies (e.g., Koehn and Knowles [16], Östling and Tiedemann [17], Dowling et al [18]) found that PB-SMT can provide better translations than NMT, and many found the opposite results [8,19,20]. Hence, the findings of this line of MT research have indeed yielded a mixed bag of results, leaving the way ahead unclear.…”
Section: Introductionmentioning
confidence: 99%