2020
DOI: 10.48550/arxiv.2008.09706
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology

Abstract: Conversational interfaces are increasingly popular as a way of connecting people to information. Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpus-based conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of recognizing and classifying inappropria… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 52 publications
0
4
0
Order By: Relevance
“…A recent workshop on trolling, aggression and cyberbullying (Kumar et al, 2020) proposed tasks on aggression identification and gendered identification. Zhang et al (2020) propose a widerranging hierarchical taxonomy of malevolent dialogue, defined as "a system-generated response that is grounded in negative emotion, inappropriate behavior or unethical value basis in terms of content and dialogue acts. 2018) measure gender biases on models trained with different abusive language datasets, and propose three methods to reduce bias: debiased word embeddings, gender swap data augmentation, and fine-tuning with a larger corpus.…”
Section: Scope Of Abusive Contentmentioning
confidence: 99%
“…A recent workshop on trolling, aggression and cyberbullying (Kumar et al, 2020) proposed tasks on aggression identification and gendered identification. Zhang et al (2020) propose a widerranging hierarchical taxonomy of malevolent dialogue, defined as "a system-generated response that is grounded in negative emotion, inappropriate behavior or unethical value basis in terms of content and dialogue acts. 2018) measure gender biases on models trained with different abusive language datasets, and propose three methods to reduce bias: debiased word embeddings, gender swap data augmentation, and fine-tuning with a larger corpus.…”
Section: Scope Of Abusive Contentmentioning
confidence: 99%
“…Many existing mitigations rely on the ability to detect problematic content -often centred on content written by humans on social media platforms, such as Twitter (e.g. Waseem and Hovy, 2016;Wang et al, 2020;Zampieri et al, 2019Zampieri et al, , 2020Zhang et al, 2020), Facebook (Glavaš et al, 2020;Zampieri et al, 2020), or Reddit (Han and Tsvetkov, 2020;Zampieri et al, 2020). However, of course, conversational systems may not necessarily have the same patterns as social media content (Cercas Curry et al, 2021).…”
Section: Offensive Contentmentioning
confidence: 99%
“…Offensive system responses For offensive content generated by the systems themselves, Ram et al (2017) use keyword matching and machine learning methods to detect system responses that are profane, sexual, racially inflammatory, other hate speech, or violent. Zhang et al (2020b) develop a hierarchical classification framework for "malevolent" responses in dialogs (although their data is from Twitter rather than human-agent conversations). And apply the same classifier they used for detection of unsafe user input to system responses, in addition to proposing other methods of avoiding unsafe output (see below).…”
Section: Generating Offensive Content (Instigator Effect)mentioning
confidence: 99%