Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.558
|View full text |Cite
|
Sign up to set email alerts
|

UniTE: Unified Translation Evaluation

Abstract: Translation quality evaluation plays a crucial role in machine translation. According to the input format, it is mainly separated into three tasks, i.e., reference-only, source-only and source-reference-combined. Recent methods, despite their promising results, are specifically designed and optimized on one of them. This limits the convenience of these methods, and overlooks the commonalities among tasks. In this paper, we propose UniTE, which is the first unified framework engaged with abilities to handle all… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 24 publications
(24 citation statements)
references
References 4 publications
0
23
1
Order By: Relevance
“…Likewise, having access to only the source would correspond to sentence difficulty/complexity estimation. Similarly to Wan et al (2022); Don-Yehiya et al ( 2022), we explored both of these modes and found very high sentence-level correlations.…”
Section: Complexity and Fluency Estimationmentioning
confidence: 62%
“…Likewise, having access to only the source would correspond to sentence difficulty/complexity estimation. Similarly to Wan et al (2022); Don-Yehiya et al ( 2022), we explored both of these modes and found very high sentence-level correlations.…”
Section: Complexity and Fluency Estimationmentioning
confidence: 62%
“…There are two standard methods to extract total embedding, i.e., averaging all token embeddings and using the first token embedding. (Ranasinghe et al, 2020;Wan et al, 2022) proves the superiority of using the first token embedding compared to averaging all token embeddings. Thus, we employ the final embedding of first token e f irst as the representation of unified input x.…”
Section: Unified Embeddingmentioning
confidence: 89%
“…They likewise face the same issue of poor evaluation metrics (EMs). A recent popular trend in evaluating text generation is the design of automatic EMs based on large language models (LLMs) (Zhao et al, 2019;Sellam et al, 2020;Yuan et al, 2021;Wan et al, 2022). COMET (Rei et al, 2020) and BERTScore (Zhang et al, 2020) are two typical LLM-based text EMs, where COMET provides a text EM by learning human judgments of training data and BERTScore is computed using contextualized token embeddings of BERT (Devlin et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…In order to tackle this problem, it is better to use semantic evaluation metrics such as reconstruction-BLEU , COMET (Rei et al, 2020), BLEURT (Sellam et al, 2020), UniTE (Wan et al, 2022) and so on. Here, we choose reconstruction-BLEU, which uses a reverse NMT model trained on initial single-reference corpus to translate each candidate ŷ(n) k back to a source sentence x(n) k , and then evaluates the BLEU score between…”
Section: Reward Computationmentioning
confidence: 99%