Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) 2018
DOI: 10.18653/v1/w18-5105
|View full text |Cite
|
Sign up to set email alerts
|

Challenges for Toxic Comment Classification: An In-Depth Error Analysis

Abstract: Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task's challenges others still remain unsolved and directions for further research are needed. To this end, we compare different deep learning and shallow approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
107
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 167 publications
(109 citation statements)
references
References 33 publications
2
107
0
Order By: Relevance
“…Humor, irony and sarcasm. Supposedly humorous, ironic or sarcastic abusive content is often viewed as a source of classification error (Nobata, Thomas, Mehdad, Chang, & Tetreault, 2016;van Aken, Risch, Krestel, & Löser, 2018). However, drawing on critical studies of prejudice and hate, we propose that such content is still abusive (Weaver, 2010).…”
Section: Recognizing Abusive Contentmentioning
confidence: 91%
See 1 more Smart Citation
“…Humor, irony and sarcasm. Supposedly humorous, ironic or sarcastic abusive content is often viewed as a source of classification error (Nobata, Thomas, Mehdad, Chang, & Tetreault, 2016;van Aken, Risch, Krestel, & Löser, 2018). However, drawing on critical studies of prejudice and hate, we propose that such content is still abusive (Weaver, 2010).…”
Section: Recognizing Abusive Contentmentioning
confidence: 91%
“…Annotation is a notoriously difficult task, reflected in the low levels of inter-annotator agreement reported by most publications, particularly on more complex multi-class tasks (Sanguinetti, Poletto, Bosco, Patti, & Stranisci, 2018). Noticeably, van Aken suggests that Davidson et al's widely used hate and offensive language dataset has up to 10% of its data mislabeled (van Aken et al, 2018). Few publications provide details of their annotation process or annotation guidelines.…”
Section: Creating and Sharing Datasetsmentioning
confidence: 99%
“…Despite a public effort to recognize and reduce-if not eliminate-their occurrence (Kim, 2013;Neff, 2015), there has been no computational work to detect and analyze MAS at scale. Instead, much of the recent work has focused on explicitly toxic language (e.g., Waseem et al, 2017), with surveys of the area also overlooking this important and challenging task of recognizing this subtle toxicity (van Aken et al, 2018;Salminen et al, 2018;Fortuna and Nunes, 2018). Indeed, as Figure 1 suggests, current popular tools for toxic language detection do not recognize the toxicity of MAS and further, sentiment tools can label these comments as being positive.…”
Section: Introductionmentioning
confidence: 99%
“…Sample size (Seo et al, 2016) 50 (Kundu and Ng, 2018) 50 (Hu et al, 2018) 50 (Min et al, 2018) 50 (Weissenborn et al, 2017) 55 (Chen et al, 2016) 100 (Min et al, 2017) 100 (Wadhwa et al, 2018) 100 (Fader et al, 2013) 100 (van Aken et al, 2018) 200 Average 85.5 Table 3: Surveyed papers and their error sample sizes.…”
Section: Papermentioning
confidence: 99%