Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

Munk, Michal; Munková, Daša; Benko, Ľubomír

doi:10.1007/978-3-319-49397-8_12

Cited by 7 publications

(9 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Entropy in the context of the analysis of the reliability of automatic MT evaluation was first used by Munk et al [17]. The authors dealt with the verification of reliability of automatic MT evaluation.…”

Section: Related Workmentioning

confidence: 99%

“…The estimates were not mutually compared and assessed, which estimate is the most suitable for assessing the reliability of automatic MT evaluation. Based on the results [17] we examine the use of entropy as an estimation of the reliability of automatic MT evaluation in comparison to traditional/conventional estimates.…”

Section: Related Workmentioning

confidence: 99%

“…Mostly, entropy in information theory is defined as a degree of the system's disorder or randomness. Based on Shannon's definition [5,6,17], given a class random variable C with a discrete probability distribution…”

Section: Entropymentioning

confidence: 99%

“…In this study, similar to Munk et al experiment [17], we will focus only on automatic metrics based on lexicon methods.…”

Section: Metricsmentioning

confidence: 99%

“…CDER [11,17] is a measure oriented towards recall, but based on the Levenshtein distance. It uses the fact that the number of blocks in a sentence is the same as the number of gaps between them plus one.…”

Section: Metricsmentioning

confidence: 99%

See 4 more Smart Citations

Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics

Munk

Munková

Benko

2018

IFS

Self Cite

View full text Add to dashboard Cite

The study describes an experiment with different estimations of reliability. Reliability reflects the technical quality of the measurement procedure such as an automatic evaluation of Machine Translation (MT). Reliability is an indicator of accuracy, the reliability of measuring, in our case, measuring the accuracy and error rate of MT output based on automatic metrics (precision, recall, f-measure, Bleu-n, WER, PER, and CDER). The experiment showed metrics (Bleu-4 and WER) that reduce the overall reliability of the automatic evaluation of accuracy and error rate using entropy. Based on the results we can say, that the use of entropy for the estimation of reliability brings more accurate results than conventional estimations of reliability (Cronbach's alpha and correlation). MT evaluation, based on n-grams or edit distance, using entropy could offer a new view on lexicon-based metrics in comparison to commonly used ones.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Entropymentioning

confidence: 99%

“…In this study, similar to Munk et al experiment [17], we will focus only on automatic metrics based on lexicon methods.…”

Section: Metricsmentioning

confidence: 99%

Section: Metricsmentioning

confidence: 99%

See 3 more Smart Citations

Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics

Munk

Munková

Benko

2018

IFS

Self Cite

View full text Add to dashboard Cite

show abstract

Error Classification Using Automatic Measures Based on n-grams and Edit Distance

Benko¹,

Benková²,

Munková³

et al. 2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

Analysis of Edit Operations for Post-editing Systems

Kapusta

Benko

Munková

et al. 2021

Int J Comput Intell Syst

Self Cite

View full text Add to dashboard Cite

Post-editing has become an important part not only of translation research but also in the global translation industry. While computer-aided translation tools, such as translation memories, are considered to be part of a translator's work, lately, machine translation (MT) systems have also been accepted by human translators. However, many human translators are still adopting the changes brought by translation technologies to the translation industry. This paper introduces a novel approach for seeking suitable pairs of n-grams when recommending n-grams (corresponding n-grams between MT and post-edited MT) based on the type of text (manual or administrative) and MT system used for machine translation. A tool that recommends and speeds up the correction of MT was developed to help the post-editors with their work. It is based on the analysis of words with the same lemmas and analysis of n-gram recommendations. These recommendations are extracted from sequence patterns of the mismatched words (MisMatch) between MT output and post-edited MT output. The paper aims to show the usage of morphological analysis for recommending the post-edit operations. It describes the usage of mismatched words in the n-gram recommendations for the post-edited MT output. The contribution consists of the methodology for seeking suitable pairs of words, n-grams and additionally the importance of taking into account metadata (the type of the text and/or style and MT system) when recommending post-edited operations.

show abstract

Identification of Relevant and Redundant Automatic Metrics for MT Evaluation

Cited by 7 publications

References 21 publications

Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics

Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics

Error Classification Using Automatic Measures Based on n-grams and Edit Distance

Analysis of Edit Operations for Post-editing Systems

Contact Info

Product

Resources

About