2016
DOI: 10.48550/arxiv.1605.04515
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Machine Translation Evaluation Resources and Methods: A Survey

Abstract: We introduce the Machine Translation (MT) evaluation survey that contains both manual and automatic evaluation methods. The traditional human evaluation criteria mainly include the intelligibility, fidelity, fluency, adequacy, comprehension, and informativeness. The advanced human assessments include task-oriented measures, post-editing, segment ranking, and extended criteriea, etc. We classify the automatic evaluation methods into two categories, including lexical similarity scenario and linguistic features a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 11 publications
(13 citation statements)
references
References 33 publications
0
12
0
1
Order By: Relevance
“…In particular, this metric measures the degree of n-gram overlapping between the strings of words produced by the model and the human translation references at the corpus level. BLEU measures translation quality by the accuracy of translating n-grams to n-grams, for n-gram of size 1 to 4 [33]. The Exact match accuracy (ACC) is another automatic metric often used for evaluating neural machine translation [56,89,90,91].…”
Section: Feasibility In Applying Nmt For Shellcode Generationmentioning
confidence: 99%
“…In particular, this metric measures the degree of n-gram overlapping between the strings of words produced by the model and the human translation references at the corpus level. BLEU measures translation quality by the accuracy of translating n-grams to n-grams, for n-gram of size 1 to 4 [33]. The Exact match accuracy (ACC) is another automatic metric often used for evaluating neural machine translation [56,89,90,91].…”
Section: Feasibility In Applying Nmt For Shellcode Generationmentioning
confidence: 99%
“…Log-MNEXT = F-score * (1-Frag Penalty) (6) To get the F-Score, Precision (P) and recall (R) are calculated by assigning weights to the various unigram matches. = pred and = ref (7) where for a matcher type i ∈ {exact match, stemmed match, synonym match}, is the weight and is the number of matched unigrams. pred is the length of predicted text and ref is the length of reference text.…”
Section: Frag Penaltymentioning
confidence: 99%
“…There are works comparing [3,7,14,18] automated evaluation metrics for machine translation. Our work is different from them because we discuss the suitability of metrics used in evaluating commit message generation tools.…”
Section: Related Workmentioning
confidence: 99%
“…In particular, this metric measures the degree of n-gram overlapping between the strings of words produced by the model and the human translation references at the corpus level. BLEU measures translation quality by the accuracy of translating n-grams to n-grams, for n-gram of size 1 to 4 [57]. The Exact match accuracy (ACC) is another automatic metric often used for evaluating neural machine translation [7], [8], [15], [16].…”
Section: A Automatic Evaluationmentioning
confidence: 99%