Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2385
|View full text |Cite
|
Sign up to set email alerts
|

YSDA Participation in the WMT'16 Quality Estimation Shared Task

Abstract: This paper describes Yandex School of Data Analysis (YSDA) submission for WMT2016 Shared Task on Quality Estimation (QE) / Task 1: Sentence-level prediction of post-editing effort. We solve the problem of quality estimation by using a machine learning approach, where we try to learn a regressor from feature space to HTER score. By enriching the baseline features with the syntactical features and additional translation system based features, we achieve Pearson correlation of 0.525 on the test set.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 21 publications
(16 citation statements)
references
References 5 publications
0
15
0
Order By: Relevance
“…Along with the features produced by the Bilingual Expert model, we extract another 17 QE baseline features for the sentence-level task using QuEst++ and additional resources (source and target corpora, language models, ngram counts and lexical translation tables) provided on the WMT18 QE website 2 . Kozlova et al (2016) verifies the significance of these features using Random Forest (Breiman, 2001). Four of them are the most crucial among all according to their degrees of importance.…”
Section: Human-crafted Featuresmentioning
confidence: 94%
“…Along with the features produced by the Bilingual Expert model, we extract another 17 QE baseline features for the sentence-level task using QuEst++ and additional resources (source and target corpora, language models, ngram counts and lexical translation tables) provided on the WMT18 QE website 2 . Kozlova et al (2016) verifies the significance of these features using Random Forest (Breiman, 2001). Four of them are the most crucial among all according to their degrees of importance.…”
Section: Human-crafted Featuresmentioning
confidence: 94%
“…• number of tokens about punctuation in the source sentence and the translation (Kozlova et al, 2016) and the cosine between them.…”
Section: Referential Translation Machinesmentioning
confidence: 99%
“…We complement character-level information with engineered features, given that the most effective QE and EIA methods in previous work heavily exploit them (Kim and Lee, 2016;Kozlova et al, 2016;Refaee and Rieser, 2016;Wang et al, 2016). To do so, we apply a simple multi-layer perceptron (MLP) over a set of input engineered features.…”
Section: Incorporating Engineered Featuresmentioning
confidence: 99%