Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society 2018
DOI: 10.1145/3278721.3278777
|View full text |Cite
|
Sign up to set email alerts
|

Ethical Challenges in Data-Driven Dialogue Systems

Abstract: The use of dialogue systems as a medium for human-machine interaction is an increasingly prevalent paradigm. A growing number of dialogue systems use conversation strategies that are learned from large datasets. There are well documented instances where interactions with these system have resulted in biased or even offensive conversations due to the data-driven training process. Here, we highlight potential ethical issues that arise in dialogue systems research, including: implicit biases in data-driven system… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
101
0
2

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 121 publications
(114 citation statements)
references
References 30 publications
0
101
0
2
Order By: Relevance
“…Then, given a list of hot queries, a word embedding based function was used to find the similar queries with the responses. Henderson et al (2018) suggest that, due to their subjective nature and goal of mimicking human behaviour, datadriven dialogue models are susceptible to implicitly encode underlying biases in human dialogue, similar to related studies on biased lexical semantics derived from large corpora (Caliskan et al, 2017;Bolukbasi et al, 2016). By training a model on clean data, we aim to verify whether these models are able to provide more appropriate responses.…”
Section: Systems Evaluatedmentioning
confidence: 56%
See 2 more Smart Citations
“…Then, given a list of hot queries, a word embedding based function was used to find the similar queries with the responses. Henderson et al (2018) suggest that, due to their subjective nature and goal of mimicking human behaviour, datadriven dialogue models are susceptible to implicitly encode underlying biases in human dialogue, similar to related studies on biased lexical semantics derived from large corpora (Caliskan et al, 2017;Bolukbasi et al, 2016). By training a model on clean data, we aim to verify whether these models are able to provide more appropriate responses.…”
Section: Systems Evaluatedmentioning
confidence: 56%
“…This includes our own in-house bot which was trained on clean data. As such, the problem is not that the bot reflects bias in the data (Henderson et al, 2018), but how humans construct contextual meaning. 22 Some (of the less offensive) examples include: Prompt: "I love watching porn."…”
Section: Prompt Contextmentioning
confidence: 99%
See 1 more Smart Citation
“…To characterize the types of paraphrasing issues, two authors of this paper investigated the crowd-sourced paraphrases, and recognized 5 primary categories of paraphrasing issues. However, we only considered paraphrase-level issues related to the validity of a paraphrase without considering dataset-level quality issues such as lexical diversity (Negri et al, 2012) and bias (Henderson et al, 2018).…”
Section: Common Paraphrasing Issuesmentioning
confidence: 99%
“…A lack of variations in training samples can result in incorrect intent detection and consequently execution of undesirable tasks (e.g., booking an expensive hotel instead of a cheap room) (Henderson et al, 2018). Likewise, quality issues in the training samples can lead to unmitigated disasters (Neff and Nagy, 2016) as it happened to Microsoft's Tay by making a huge number of offensive commentaries due to biases in the training data (Henderson et al, 2018). It is therefore not surprising that research and development into training data acquisition for bots has received significant consideration (Campagna et al, 2017;Kang et al, 2018).…”
Section: Introductionmentioning
confidence: 99%