Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.259
|View full text |Cite
|
Sign up to set email alerts
|

From the Detection of Toxic Spans in Online Discussions to the Analysis of Toxic-to-Civil Transfer

Abstract: We study the task of toxic spans detection, which concerns the detection of the spans that make a text toxic, when detecting such spans is possible. We introduce a dataset for this task, TOXICSPANS, which we release publicly. By experimenting with several methods, we show that sequence labeling models perform best. Moreover, methods that add generic rationale extraction mechanisms on top of classifiers trained to predict if a post is toxic or not are also surprisingly promising. Finally, we use TOXICSPANS and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 23 publications
0
6
0
Order By: Relevance
“…Extensive prior works attempted to define and predict inappropriate behaviors online, yet framed in various ways, such as explicit attacks (Wulczyn, Thain, and Dixon 2016), online abuse (Mishra, Yannakoudakis, and Shutova 2019) toxicity (Pavlopoulos, Malakasiotis, and Androutsopoulos 2017;Zampieri et al 2019;Pavlopoulos et al 2022;Brassard-Gourdeau and Khoury 2019), and hate speech (Davidson et al 2017;Schmidt and Wiegand 2017;Mosca, Wich, and Groh 2021). For example, Davidson et al (2017) trained a multi-class model to differentiate between everyday offensive language and serious hate speech.…”
Section: Quantifying Incivility and Similar Conceptsmentioning
confidence: 99%
“…Extensive prior works attempted to define and predict inappropriate behaviors online, yet framed in various ways, such as explicit attacks (Wulczyn, Thain, and Dixon 2016), online abuse (Mishra, Yannakoudakis, and Shutova 2019) toxicity (Pavlopoulos, Malakasiotis, and Androutsopoulos 2017;Zampieri et al 2019;Pavlopoulos et al 2022;Brassard-Gourdeau and Khoury 2019), and hate speech (Davidson et al 2017;Schmidt and Wiegand 2017;Mosca, Wich, and Groh 2021). For example, Davidson et al (2017) trained a multi-class model to differentiate between everyday offensive language and serious hate speech.…”
Section: Quantifying Incivility and Similar Conceptsmentioning
confidence: 99%
“…Most of the research conducted in the area of toxic span classification was on English language corpora. This research began with the aforementioned Semeval2021 task or with its follow-up work on detoxifying posts [27] using the ToxicSpans dataset. Many of these studies focused on the explainability of existing classification methods [28,29].…”
Section: Offensive and Toxic Spans' Datasetsmentioning
confidence: 99%
“…In the explanations as input experiments (see section 5), the complete post (tokenized) is taken as their rationale for the normal posts. ToxicSpans dataset: The ToxicSpans (Pavlopoulos et al, 2022) dataset is a subset (containing 11,006 samples labelled toxic) of the Civil Comments dataset (1.2M Posts). The dataset also contains the toxic spans, i.e., the region of the texts found toxic.…”
Section: Datasets and Metricsmentioning
confidence: 99%
“…We, for the first time, introduce several prompt variations and input instructions to probe two of the LLMs (GPT 3.5 and text-davinci) across three datasets -HateXplain (Mathew et al, 2021), implicit hate (ElSherief et al, 2021) and Toxic-Spans (Pavlopoulos et al, 2022). Note that all these three datasets contain ground truth explanations in the form of either rationales (Mathew et al, 2021;Pavlopoulos et al, 2022) or implied statements (ElSherief et al, 2021) that tells why an annotator took a particular labelling decision. In addition, two of the datasets also contain the information about the target/victim community against whom the hate speech was hurled.…”
Section: Introductionmentioning
confidence: 99%