Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014
DOI: 10.3115/v1/d14-1025
|View full text |Cite
|
Sign up to set email alerts
|

Fitting Sentence Level Translation Evaluation with Many Dense Features

Abstract: Sentence level evaluation in MT has turned out far more difficult than corpus level evaluation. Existing sentence level metrics employ a limited set of features, most of which are rather sparse at the sentence level, and their intricate models are rarely trained for ranking. This paper presents a simple linear model exploiting 33 relatively dense features, some of which are novel while others are known but seldom used, and train it under the learning-to-rank framework. We evaluate our metric on the standard WM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
40
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 47 publications
(44 citation statements)
references
References 10 publications
4
40
0
Order By: Relevance
“…Recent investigations have shown that character level n-grams play an important role for automatic evaluation as a part of more complex metrics such as MTERATER (Parton et al, 2011) and BEER (Stanojević and Sima'an, 2014a;Stanojević and Sima'an, 2014b). However, they have not been investigated as an individual metric so far.…”
Section: Introductionmentioning
confidence: 99%
“…Recent investigations have shown that character level n-grams play an important role for automatic evaluation as a part of more complex metrics such as MTERATER (Parton et al, 2011) and BEER (Stanojević and Sima'an, 2014a;Stanojević and Sima'an, 2014b). However, they have not been investigated as an individual metric so far.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, we have a set of systems that are optimized in order to improve target morphology. The automatic scores of the systems submitted at WMT'17 8 are in Table 4 where we report BLEU, BEER (Stanojević and Sima'an, 2014) and CharacTER (Wang et al, 2016). 9 We also computed a morphology accuracy for these systems.…”
Section: Methodsmentioning
confidence: 99%
“…BLEU scores for English-Czech Results are in Table 5, where we provide, in addition to BLEU, scores computed by BEER (Stanojević and Sima'an, 2014) and CharacTER (Wang et al, 2016). These two metrics proved to be more adapted to MRLs by Bojar et al (2016).…”
Section: Small Systemmentioning
confidence: 99%