2014
DOI: 10.3115/v1/w14-33
|View full text |Cite
|
Sign up to set email alerts
|

Proceedings of the Ninth Workshop on Statistical Machine Translation

Abstract: The focus of our workshop was to use parallel corpora for machine translation. Recent experimentation has shown that the performance of SMT systems varies greatly with the source language. In this workshop we encouraged researchers to investigate ways to improve the performance of SMT systems for diverse languages, including morphologically more complex languages, languages with partial free word order, and low-resource languages.Prior to the workshop, in addition to soliciting relevant papers for review and p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 83 publications
0
5
0
Order By: Relevance
“…Similar to GPT-3, the training data for Base LM is around 90% English and includes some text in other languages that was not specifically used to train the model to perform machine translation. We also evaluate FLAN's performance on machine translation for the three datasets evaluated in the GPT-3 paper: French-English from WMT'14 (Bojar et al, 2014), and German-English and Romanian-English from WMT'16 (Bojar et al, 2016).…”
Section: Translationmentioning
confidence: 99%
“…Similar to GPT-3, the training data for Base LM is around 90% English and includes some text in other languages that was not specifically used to train the model to perform machine translation. We also evaluate FLAN's performance on machine translation for the three datasets evaluated in the GPT-3 paper: French-English from WMT'14 (Bojar et al, 2014), and German-English and Romanian-English from WMT'16 (Bojar et al, 2016).…”
Section: Translationmentioning
confidence: 99%
“…We perform our comparison on four datasets of varying sizes: IWSLT14 German-English (Federico et al, 2014), WMT14 English-{German, French} (Bojar et al, 2014), and WMT18 English-Chinese (Bojar et al, 2018). We split each dataset into train and validation pairs and evaluate DEÑEN models on the test sets TED-{dev10,dev12, test10, tst11, tst12}, EN-{DE, FR} models on newstest14 and ENÑZH models on new-stest17.…”
Section: Datasets and Evaluationmentioning
confidence: 99%
“…Dataset We test our quantization strategy in 3 different translation directions: English-to-German (En2De), English-to-French (En2Fr), and Englishto-Japanese (En2Jp). For En2De and En2Fr, we utilize all of the trainset of WMT2014 and use newstest2013 as devset and newstest2014 as testset (Bojar et al, 2014). For En2Jp, we use KFTT (Neubig, 2011), JESC (Pryzant et al, 2018), and WIT 3 (Cettolo et al, 2012) corpus.…”
Section: Experimental Settingsmentioning
confidence: 99%