Findings of the Association for Computational Linguistics: EMNLP 2020 2020
DOI: 10.18653/v1/2020.findings-emnlp.82
|View full text |Cite
|
Sign up to set email alerts
|

A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing

Abstract: We investigate a long-perceived shortcoming in the typical use of BLEU: its reliance on a single reference. Using modern neural paraphrasing techniques, we study whether automatically generating additional diverse references can provide better coverage of the space of valid translations and thereby improve its correlation with human judgments. Our experiments on the into-English language directions of the WMT19 metrics task (at both the system and sentence level) show that using paraphrased references does gen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 30 publications
0
6
0
Order By: Relevance
“…Add Diverse References During Training: From Section 4.2, we find that both the neural metric and the task-specific model are not robust to paraphrases. We also recommend the inclusion of diverse references through automatic paraphrasing (Bawden et al, 2020) or data augmentation during the training of neural metrics.…”
Section: Recommendationsmentioning
confidence: 99%
“…Add Diverse References During Training: From Section 4.2, we find that both the neural metric and the task-specific model are not robust to paraphrases. We also recommend the inclusion of diverse references through automatic paraphrasing (Bawden et al, 2020) or data augmentation during the training of neural metrics.…”
Section: Recommendationsmentioning
confidence: 99%
“…It is essential to know that the more translations of references per sentence, the higher the value. To produce a high BLEU value, the length of the translated sentence must be close to the length of the reference sentence, and the translated sentence must have the same word and order as the reference sentence [44]. The writing of the BLEU formula can be seen in Eq.…”
Section: E Evaluationmentioning
confidence: 99%
“…For each paraphrasing method and each dataset, metrics are computed over unlabeled sentences and their paraphrases. To assess the diversity of paraphrases generated by the different methods, the popular BLEU metric in Neural Machine Translation is a poor choice (Bawden et al, 2020). We use the bi-gram diversity (dist-2) metric as proposed by (Ippolito et al, 2019), which computes the number of distinct 2-grams divided by the total amount of tokens.…”
Section: Evaluation Of Paraphrase Diversitymentioning
confidence: 99%