FPI: Failure Point Isolation in Large-scale Conversational Assistants

Khaziev, Rinat; Shahid, Usman; Röding, Tobias; Chada, Rakesh; Kapanci, Emir; Natarajan, Pradeep

doi:10.18653/v1/2022.naacl-industry.17

Cited by 6 publications

(2 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such similarities motivate us to draw parallels between the NLP robustness literature and HCI perspectives of system failures. By understanding how different types of failures affect trust in voice assistants overall, we can then try to pinpoint the underlying NLP components that are the root cause of the most critical failures that erode trust [30]. Technical solutions can then be leveraged to improve the robustness of the most critical parts of the system in order to increase user trust and long-term engagement most efficiently.…”

Section: Nlp Approaches To Voice Assistant Failuresmentioning

confidence: 99%

A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures

Baughan

Mercurio

Liu

et al. 2023

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

Despite huge gains in performance in natural language understanding via large language models in recent years, voice assistants still often fail to meet user expectations. In this study, we conducted a mixed-methods analysis of how voice assistant failures affect users' trust in their voice assistants. To illustrate how users have experienced these failures, we contribute a crowdsourced dataset of 199 voice assistant failures, categorized across 12 failure sources.Relying on interview and survey data, we find that certain failures, such as those due to overcapturing users' input, derail user trust more than others. We additionally examine how failures impact users' willingness to rely on voice assistants for future tasks. Users often stop using their voice assistants for specific tasks that result in failures for a short period of time before resuming similar usage. We demonstrate the importance of low stakes tasks, such as playing music, towards building trust after failures. CCS CONCEPTS• Human-centered computing → Empirical studies in HCI.

show abstract

Section: Nlp Approaches To Voice Assistant Failuresmentioning

confidence: 99%

A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures

Baughan

Mercurio

Liu

et al. 2023

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…the survey in (Hedderich et al, 2020)). A number of works identify utterances with processing errors through offline analysis (Sethi et al, 2021;Gupta et al, 2021;Chada et al, 2021;Khaziev et al, 2022). These approaches however still need human annotation in an active learning loop to improve production models.…”

Section: Related Workmentioning

confidence: 99%

Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

Schroedl¹,

Kumar²,

Hajebi³

et al. 2022

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track

View full text Add to dashboard Cite

Natural language understanding (NLU) models are a core component of large-scale conversational assistants. Collecting training data for these models through manual annotations is slow and expensive that impedes the pace of model improvement. We present a three stage approach to address this challenge: First, we identify a large set of relatively infrequent utterances from live traffic where the users implicitly communicated satisfaction with a response (such as by not interrupting), along with the existing model outputs as candidate annotations. Second, we identify a small subset of these utterances usings Integrated Gradients based importance scores computed with the current models. Finally, we augment our training sets with these utterances and retrain our models. We demonstrate the effectiveness of our approach in a large-scale conversational assistant, processing billions of utterances every week. By augmenting our training set with just 0.05% more utterances through our approach, we observe statistically significant improvements for infrequent tail utterances: a 0.45% reduction in semantic error rate (Se-mER) in offline experiments, and a 1.23% reduction in defect rates in online A/B tests.

show abstract

Performance and Failure Cause Estimation for Machine Learning Systems in the Wild

Liu,

Khan,

Niu

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

FPI: Failure Point Isolation in Large-scale Conversational Assistants

Cited by 6 publications

References 11 publications

A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures

A Mixed-Methods Approach to Understanding User Trust after Voice Assistant Failures

Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

Performance and Failure Cause Estimation for Machine Learning Systems in the Wild

Contact Info

Product

Resources

About