Findings of the Association for Computational Linguistics: NAACL 2022 2022
DOI: 10.18653/v1/2022.findings-naacl.17
|View full text |Cite
|
Sign up to set email alerts
|

Opportunities for Human-centered Evaluation of Machine Translation Systems

Abstract: Machine translation models are embedded in larger user-facing systems. Although model evaluation has matured, evaluation at the systems level is still lacking. We review literature from both the translation studies and HCI communities about who uses machine translation and for what purposes. We emphasize an important difference in evaluating machine translation models versus the physical and cultural systems in which they are embedded. We then propose opportunities for improved measurement of user-facing trans… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 41 publications
0
1
0
Order By: Relevance
“…This presentation, alongside the anthropomorphic bias of deep learning models (Watson, 2019), can perpetuate these opinions including harmful stereotypes. This is a general limitation of NLG models which we are unable to capture using standardized benchmarks alongside intrinsic evaluations, and others have thus called for more work to evaluate models in the physical and cultural context in which they are applied (Liebling et al, 2022;Bhatt et al, 2022). We also note that few, if any, benchmark currently reports the environmental side-effects of training and serving NLG models (Strubell, Ganesh, & Mc-Callum, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…This presentation, alongside the anthropomorphic bias of deep learning models (Watson, 2019), can perpetuate these opinions including harmful stereotypes. This is a general limitation of NLG models which we are unable to capture using standardized benchmarks alongside intrinsic evaluations, and others have thus called for more work to evaluate models in the physical and cultural context in which they are applied (Liebling et al, 2022;Bhatt et al, 2022). We also note that few, if any, benchmark currently reports the environmental side-effects of training and serving NLG models (Strubell, Ganesh, & Mc-Callum, 2019).…”
Section: Discussionmentioning
confidence: 99%