Empirical Analysis of Bias in Voice-based Personal Assistants

Lima, Lanna; Furtado, Vasco; Furtado, Elizabeth; Almeida, Virgı́lio

doi:10.1145/3308560.3317597

“…Voice variation also plays a role: ASR error distribution differs by speaker background variables such as accent (Zheng et al, 2005), in turn affecting the downstream systems (Harwell, 2018;Lima et al, 2019;Palanica et al, 2019). To emulate speaker variation in the synthetic setting, we use Google English Text-to-Speech to pronounce the XQuAD questions in eight different voices, varying the provided accent and gender settings.…”

Section: Results and Analysismentioning

confidence: 99%

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

Ravichander

¹

,

Dalmia

²

,

Ryskina

³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed question, we show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error, and performance can degrade substantially based on these upstream noise sources even for powerful pre-trained QA models. We conclude that there is substantial room for progress before QA systems can be effectively deployed, highlight the need for QA evaluation to expand to consider real-world use, and hope that our findings will spur greater community interest in the issues that arise when our systems actually need to be of utility to humans. 1 XQuAD EN ASR MT Keyboard Model EM F1 EM F1 EM F1 EM F1

show abstract

“…Voice variation also plays a role: ASR error distribution differs by speaker background variables such as accent (Zheng et al, 2005), in turn affecting the downstream systems (Harwell, 2018;Lima et al, 2019;Palanica et al, 2019). To emulate speaker variation in the synthetic setting, we use Google English Text-to-Speech to pronounce the XQuAD questions in eight different voices, varying the provided accent and gender settings.…”

Section: Results and Analysismentioning

confidence: 99%

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

Ravichander¹,

Dalmia²,

Ryskina³

et al. 2021

Preprint

View full text Add to dashboard Cite

When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed question, we show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error, and performance can degrade substantially based on these upstream noise sources even for powerful pre-trained QA models. We conclude that there is substantial room for progress before QA systems can be effectively deployed, highlight the need for QA evaluation to expand to consider real-world use, and hope that our findings will spur greater community interest in the issues that arise when our systems actually need to be of utility to humans. 1

show abstract

“…By using video instead of actual interaction with the device, we maintain the highest degree of control over the similarity of interaction between participants and devices, thereby increasing the comparability between prototypes. Using natural language to interact with PAs often leads to voice recognition errors that would not be consistent among participants and therefore leading to variability of user experiences and therefore evaluation [24]. We developed six videos for each FiPA version, each showcasing the same three successful and three failed interactions.…”

Section: Methodsmentioning

confidence: 99%

The Effect of Embodied Anthropomorphism of Personal Assistants on User Perceptions

Schneiders

¹

,

Papachristos

²

,

Berkel

³

2021

33rd Australian Conference on Human-Computer Interaction

View full text Add to dashboard Cite

We investigate the impact of anthropomorphism on embodied AI through a study of personal assistants (PA). The effects of physical embodiment remain underexplored while the consumer market for PAs shows an increase in the diversity of physical appearances of these products. We designed three fictional personal assistants with varying levels of embodied anthropomorphism. We validated that our prototypes differed significantly in levels of anthropomorphism (N = 26). We developed a set of identical videos for each device, demonstrating realistic end-user interaction across six scenarios. Using a between-subject video survey study (N = 150), we evaluate the impact of different levels of embodied anthropomorphism on the perception of personal assistants. Our results show that while anthropomorphism did not significantly affect the perception of Overall Goodness, it affected perceptions of Perceived Intelligence, Likeability, and the device's Pragmatic Qualities. Finally, we discuss the implications of the identified relationships between anthropomorphism and user confidence in embodied AI systems.

show abstract

Empirical Analysis of Bias in Voice-based Personal Assistants

Cited by 36 publications

References 5 publications

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering

The Effect of Embodied Anthropomorphism of Personal Assistants on User Perceptions

Contact Info

Product

Resources

About