Proceedings of the Second Conference on Machine Translation 2017
DOI: 10.18653/v1/w17-4737
|View full text |Cite
|
Sign up to set email alerts
|

Tilde's Machine Translation Systems for WMT 2017

Abstract: The paper describes Tilde's EnglishLatvian and Latvian-English machine translation systems for the WMT 2017 shared task in news translation. Both constrained and unconstrained systems are described. Our constrained systems were ranked as the best performing systems according to the automatic evaluation results. The paper gives details to how we pre-processed training data, the NMT system architecture that we used for training the NMT models, the SMT systems and their usage in NMT-SMT hybrid system configuratio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
29
0
1

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
3
1

Relationship

3
5

Authors

Journals

citations
Cited by 22 publications
(34 citation statements)
references
References 9 publications
2
29
0
1
Order By: Relevance
“…Previous work also explored generative morphological models, known as Factored Translation Models, that explicitly integrate additional linguistic markup at the word level to learn morphology . In NMT training, using subword units such as byte-pair encoding (Sennrich, Haddow, and Birch 2016) has become a de facto standard in training competition grade systems Pinnis et al 2017). A few have tried morpheme-based segmentation (Bradbury and Socher 2016), and several even used character-based systems (Chung, Cho, and Bengio 2016;Lee, Cho, and Hofmann 2017) to achieve similar performance as the BPE-segmented systems.…”
Section: Morphologymentioning
confidence: 99%
“…Previous work also explored generative morphological models, known as Factored Translation Models, that explicitly integrate additional linguistic markup at the word level to learn morphology . In NMT training, using subword units such as byte-pair encoding (Sennrich, Haddow, and Birch 2016) has become a de facto standard in training competition grade systems Pinnis et al 2017). A few have tried morpheme-based segmentation (Bradbury and Socher 2016), and several even used character-based systems (Chung, Cho, and Bengio 2016;Lee, Cho, and Hofmann 2017) to achieve similar performance as the BPE-segmented systems.…”
Section: Morphologymentioning
confidence: 99%
“…The approximate size of each of the parallel corpora is 1.6M sentences. As a starting point, we use the data as pre-processed (filtered, normalised, tokenised) by Pinnis et al (2017b) for their experiments.…”
Section: Experiments and Resultsmentioning
confidence: 99%
“…In WMT 2017, Tilde participated with MLSTM-based NMT systems (Pinnis et al, 2017c). In this paper, we compare the MLSTMbased models with Transformer models for English-Estonian and Estonian-English and we show that the state-of-the-art of WMT 2017 is well behind the new models.…”
Section: Introductionmentioning
confidence: 87%
“…NMT models so far have struggled with translating rare or unseen words (not different surface forms, but rather different words) correctly (Pinnis et al, 2017c). Named entities and non-translatable entities (various product names, identifiers, etc.)…”
Section: Automatic Post-editing Of Named Entitiesmentioning
confidence: 99%