Opportunities for Human-centered Evaluation of Machine Translation Systems

Liebling, Daniel J.; Heller, Katherine; Robertson, Samantha; Deng, Wesley

doi:10.18653/v1/2022.findings-naacl.17

Cited by 3 publications

(1 citation statement)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This presentation, alongside the anthropomorphic bias of deep learning models (Watson, 2019), can perpetuate these opinions including harmful stereotypes. This is a general limitation of NLG models which we are unable to capture using standardized benchmarks alongside intrinsic evaluations, and others have thus called for more work to evaluate models in the physical and cultural context in which they are applied (Liebling et al, 2022;Bhatt et al, 2022). We also note that few, if any, benchmark currently reports the environmental side-effects of training and serving NLG models (Strubell, Ganesh, & Mc-Callum, 2019).…”

Section: Discussionmentioning

confidence: 99%

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Gehrmann

Clark

Sellam³

2023

jair

View full text Add to dashboard Cite

Evaluation practices in natural language generation (NLG) have many known flaws, but improved evaluation approaches are rarely widely adopted. This issue has become more urgent, since neural generation models have improved to the point where their outputs can often no longer be distinguished based on the surface-level features that older metrics rely on. This paper surveys the issues with human and automatic model evaluations and with commonly used datasets in NLG that have been pointed out over the past 20 years. We summarize, categorize, and discuss how researchers have been addressing these issues and what their findings mean for the current state of model evaluations. Building on those insights, we lay out a long-term vision for evaluation research and propose concrete steps for researchers to improve their evaluation processes. Finally, we analyze 66 generation papers from recent NLP conferences in how well they already follow these suggestions and identify which areas require more drastic changes to the status quo.

show abstract

Section: Discussionmentioning

confidence: 99%

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Gehrmann

Clark

Sellam³

2023

jair

View full text Add to dashboard Cite

show abstract

Human-Centered Evaluation and Auditing of Language Models

Xiao,

Deng,

Lam

et al. 2024

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

No Barrier

Guttikonda,

Sanchit,

Krishnamoorthy

et al. 2024

Advances in Business Information Systems and Analytics

View full text Add to dashboard Cite

In a digital realm, language diversity remains a significant hurdle to effective global communication, impacting approximately 60% of internet users worldwide. The aim is to promote inclusive conversation and overcome the language barrier in the online world where people from various backgrounds work together. The chapter revolves an NMT model, a transformer-based architecture for translation which facilitates real time translations and contextually aware along with a fine-tuned front end chat room specifically crafted for the users by providing multiple well-known languages with smooth translation so that communication remains fluid and accurate which can significantly improve the online community. Introduced digital twin technologies into this which is a concept that digitally mirrors real world and process this digital twin analysis focuses on user side preferences, inputs, and contextual meanings. The outcome of the study holds the promise of forever changing the structure of digital communication between multiple languages which will turn in an evolving online world.

show abstract

Opportunities for Human-centered Evaluation of Machine Translation Systems

Cited by 3 publications

References 41 publications

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Human-Centered Evaluation and Auditing of Language Models

No Barrier

Contact Info

Product

Resources

About