2021
DOI: 10.48550/arxiv.2108.11830
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Abstract: Dialogue models trained on human conversations inadvertently learn to generate offensive responses. Moreover, models can insult anyone by agreeing with an offensive context. To understand the dynamics of contextually offensive language, we study the stance of dialogue model responses in offensive Reddit conversations. Specifically, we crowd-annotate TOXI-CHAT, a new dataset of 2,000 Reddit threads and model responses labeled with offensive language and stance. Our analysis reveals that 42% of user responses ag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 27 publications
0
6
0
Order By: Relevance
“…Most methods have been tested on architectures tailored for the English language (Colombo et al 2022;Arora, Huang, and He 2021). With inclusivity and diversity in mind (Ruder 2022;van Esch et al 2022), it is necessary to assess the performance of old and new OOD detection methods on a variety of languages (Srinivasan et al 2021;de Vries, van Cranenburgh, and Nissim 2020;Baheti et al 2021;Zhang et al 2022).…”
Section: Limitation Of Existing Benchmarksmentioning
confidence: 99%
“…Most methods have been tested on architectures tailored for the English language (Colombo et al 2022;Arora, Huang, and He 2021). With inclusivity and diversity in mind (Ruder 2022;van Esch et al 2022), it is necessary to assess the performance of old and new OOD detection methods on a variety of languages (Srinivasan et al 2021;de Vries, van Cranenburgh, and Nissim 2020;Baheti et al 2021;Zhang et al 2022).…”
Section: Limitation Of Existing Benchmarksmentioning
confidence: 99%
“…Most methods have been tested on architectures tailored for the English language (Colombo et al, 2022a;Li et al, 2021;Arora et al, 2021). With inclusivity and diversity in mind (Ruder, 2022;van Esch et al, 2022), it is necessary to assess the performance of old and new OOD detection methods on a variety of languages (Srinivasan et al, 2021;de Vries et al, 2020;Baheti et al, 2021;Zhang et al, 2022).…”
Section: Limitation Of Existing Benchmarksmentioning
confidence: 99%
“…Safety assessments are also conducted by constructing contexts based on templates or collected datasets. For example, some past work find that conversational models tend to become more unsafe faced with specific contexts like toxic or biased languages [1511], harassment [1512], and political topics [1513], etc. Also, inspired by LAMA [92], some recent works probe the safety of language models using intra-sentence (cloze) test [1116,1514,704,1515].…”
Section: Safety and Ethical Riskmentioning
confidence: 99%