Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings

Yankovskaya, Elizaveta; Tättar, Andre; Fishel, Mark

doi:10.18653/v1/w19-5410

Cited by 29 publications

(15 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, their systems were pre-trained on synthetic data, obtained by taking all of the WMT submissions from earlier years and using chrF (Popović, 2015) as the synthetic output. The approach is described in greater detail in (Yankovskaya et al, 2019).…”

Section: Utartumentioning

confidence: 99%

Findings of the WMT 2019 Shared Tasks on Quality Estimation

Fonseca¹,

Yankovskaya²,

Martins³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Self Cite

View full text Add to dashboard Cite

We report the results of the WMT19 shared task on Quality Estimation, i.e. the task of predicting the quality of the output of machine translation systems given just the source text and the hypothesis translations. The task includes estimation at three granularity levels: word, sentence and document. A novel addition is evaluating sentence-level QE against human judgments: in other words, designing MT metrics that do not need a reference translation. This year we include three language pairs, produced solely by neural machine translation systems. Participating teams from eleven institutions submitted a variety of systems to different task variants and language pairs.

show abstract

Section: Utartumentioning

confidence: 99%

Findings of the WMT 2019 Shared Tasks on Quality Estimation

Fonseca¹,

Yankovskaya²,

Martins³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our models were not trained on Gujarati (gu). For brevity, only the best QE-metric for each language pair is shown-for full results see Appendix G. a:YISI-2(Lo, 2019) b:YISI-2 SRL(Lo, 2019) c:UNI(Yankovskaya et al, 2019) d:UNI+(Yankovskaya et al, 2019).…”

mentioning

confidence: 99%

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

Thompson¹,

Post²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference. We propose training the paraphraser as a multilingual NMT system, treating paraphrasing as a zero-shot translation task (e.g., Czech to Czech). This results in the paraphraser's output mode being centered around a copy of the input sequence, which represents the best case scenario where the MT system output matches a human reference. Our method is simple and intuitive, and does not require human judgements for training. Our single model (trained in 39 languages) outperforms or statistically ties with all prior metrics on the WMT 2019 segment-level shared metrics task in all languages (excluding Gujarati where the model had no training data). We also explore using our model for the task of quality estimation as a metric-conditioning on the source instead of the reference-and find that it significantly outperforms every submission to the WMT 2019 shared task on quality estimation in every language pair.

show abstract

“…We compare with a range of reference-free metrics: ibm1-morpheme and ibm1-pos4gram (Popović, 2012), LASIM (Yankovskaya et al, 2019), LP (Yankovskaya et al, 2019), YiSi-2 and YiSi-2-srl (Lo, 2019), and reference-based baselines BLEU (Papineni et al, 2002), SentBLEU (Koehn et al, 2007) and ChrF++ (Popović, 2017) for MT evaluation (see §2). 6 The main results are reported on WMT17.…”

Section: Baselinesmentioning

confidence: 99%

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

Zhao¹,

Glavaš²,

Peyrard³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity. In this paper, we concern ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations, which represents a natural adversarial setup for multilingual encoders. Referencefree evaluation holds the promise of web-scale comparison of MT systems. We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER. We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations, namely, (a) a semantic mismatch between representations of mutual translations and, more prominently, (b) the inability to punish "translationese", i.e., low-quality literal translations. We propose two partial remedies:(1) post-hoc re-alignment of the vector spaces and (2) coupling of semantic-similarity based metrics with target-side language modeling. In segment-level MT evaluation, our best metric surpasses reference-based BLEU by 5.7 correlation points. We make our MT evaluation code available. 1

show abstract

Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings

Cited by 29 publications

References 18 publications

Findings of the WMT 2019 Shared Tasks on Quality Estimation

Findings of the WMT 2019 Shared Tasks on Quality Estimation

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

Contact Info

Product

Resources

About