Proceedings of the Ninth Workshop on Statistical Machine Translation 2014
DOI: 10.3115/v1/w14-3354
|View full text |Cite
|
Sign up to set email alerts
|

BEER: BEtter Evaluation as Ranking

Abstract: We present the UvA-ILLC submission of the BEER metric to WMT 14 metrics task. BEER is a sentence level metric that can incorporate a large number of features combined in a linear model. Novel contributions are (1) efficient tuning of a large number of features for maximizing correlation with human system ranking, and (2) novel features that give smoother sentence level scores.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
55
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 66 publications
(55 citation statements)
references
References 11 publications
0
55
0
Order By: Relevance
“…Because our method involves transliteration, which is applied at a character level, we found it also useful to evaluate the output with character-based metrics, which reward some translations even if the morphology is not completely correct. For this reason, we additionally report BEER (Stanojević and Sima'an 2014) and chrF3 (Popović 2015) scores.…”
Section: Neural Machine Translation Systemmentioning
confidence: 99%
“…Because our method involves transliteration, which is applied at a character level, we found it also useful to evaluate the output with character-based metrics, which reward some translations even if the morphology is not completely correct. For this reason, we additionally report BEER (Stanojević and Sima'an 2014) and chrF3 (Popović 2015) scores.…”
Section: Neural Machine Translation Systemmentioning
confidence: 99%
“…Following standard practice, we tune on BLEU, and after tuning we use the configuration with the highest scores on the development set with actual (corpus-level) BLEU evaluation. We report lowercase BLEU (Papineni et al 2002), METEOR (Denkowski and Lavie 2011), BEER (Stanojević and Sima'an 2014) and TER (Snover et al 2006) scores for the test set. We also report average translation length as a percentage of the reference length for all systems.…”
Section: Experimental Structurementioning
confidence: 99%
“…Nevertheless, both contribute evidence to the thesis that word order can be significantly improved without using syntax. Stanojević and Sima'an (2014) propose a new and highly successful machine translation evaluation method called BEER. This metric uses a multitude of weighted features, with weights that are directly trained to maximize correlation with human ranking.…”
Section: Learning Labelsmentioning
confidence: 99%
“…Many metrics have been proposed for MT that compare system translations against human references, with the most popular being BLEU (Papineni et al, 2002), METEOR (Denkowski and Lavie, 2014), TER (Snover et al, 2006), and, more recently, BEER (Stanojevic and Sima'an, 2014). These and other automatic metrics are often criticised for providing scores that can be non-intuitive and uninformative, especially at the sentence level (Zhang et al, 2004;Song et al, 2013;Babych, 2014).…”
Section: Introductionmentioning
confidence: 99%