Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems 2023
DOI: 10.1145/3544549.3585667
|View full text |Cite
|
Sign up to set email alerts
|

“I'm” Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets

Abstract: As virtual assistants continue to be taken up globally, there is an ever-greater need for these speech-based systems to communicate naturally in a variety of languages. Crowdsourcing initiatives have focused on multilingual translation of big, open data sets for use in natural language processing (NLP). Yet, language translation is often not one-to-one, and biases can trickle in. In this late-breaking work, we focus on the case of pronouns translated between English and Japanese in the crowdsourced Tatoeba dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 40 publications
0
1
0
Order By: Relevance
“…Notably, translations of English and Japanese NLP data sets are common, such as the crowdsourcing initiatives of Tatoeba (https://tatoeba .org/en/downloads) and MASSIVE [27]. Yet, biases have been found within these data sets: "missteps" resulting from the crowdsourced translation process [55]. This suggests that translation may be insufficient.…”
Section: Approaching the Co-design Of Vas Cross-culturallymentioning
confidence: 99%
“…Notably, translations of English and Japanese NLP data sets are common, such as the crowdsourcing initiatives of Tatoeba (https://tatoeba .org/en/downloads) and MASSIVE [27]. Yet, biases have been found within these data sets: "missteps" resulting from the crowdsourced translation process [55]. This suggests that translation may be insufficient.…”
Section: Approaching the Co-design Of Vas Cross-culturallymentioning
confidence: 99%