Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2302
|View full text |Cite
|
Sign up to set email alerts
|

Results of the WMT16 Metrics Shared Task

Abstract: This paper presents the results of the WMT13 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in WMT13 Shared Translation Task. We collected scores of 16 metrics from 8 research groups. In addition to that we computed scores of 5 standard metrics such as BLEU, WER, PER as baselines. Collected scores were evaluated in terms of system level correlation (how well each metric's scores correlate with WMT13 official human scores) and in terms of segment level co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
99
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 87 publications
(101 citation statements)
references
References 21 publications
2
99
0
Order By: Relevance
“…Such meta-evaluation commonly takes the form of the degree to which metrics scores correlate with human assessment. In MT, the stronger the correlation of a metric with human assessment, the better the metric is considered to be [12].…”
Section: Human Evaluation In Machine Translationmentioning
confidence: 99%
“…Such meta-evaluation commonly takes the form of the degree to which metrics scores correlate with human assessment. In MT, the stronger the correlation of a metric with human assessment, the better the metric is considered to be [12].…”
Section: Human Evaluation In Machine Translationmentioning
confidence: 99%
“…Below we describe the obtained results for new-stest2016 (Bojar et al, 2016b) and compare them with results of metrics tasks. At the time of publication of the article, results of newstest2019 were not yet available.…”
Section: Resultsmentioning
confidence: 99%
“…1 CHRF3 (Popović, 2015) 2 SIMPBLEU-RECALL (Song et al, 2013) 3 NIST (Doddington, 2002) 4 BEER (Stanojević and Sima'an, 2014) Table 3: The preliminary results of the WMT16 metrics task: Absolute Pearson correlation of out-ofEnglish and to-English system-level metric scores. All results are cited from (Bojar et al, 2016).…”
Section: Comparison With Other Metricsmentioning
confidence: 99%