2023
DOI: 10.48550/arxiv.2303.10430
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NoisyHate: Benchmarking Content Moderation Machine Learning Models with Human-Written Perturbations Online

Abstract: Online texts with toxic contents are a clear threat to the users on social media in particular and society in general. Although many platforms have adopted various measures (e.g., machine learning based hate-speech detection systems) to diminish their effect, toxic content writers have also attempted to evade such measures by using cleverly modified toxic words, so-called human-written text perturbations. Therefore, to help build AI-based detection to recognize those perturbations, prior methods have developed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…Prior works on toxic content detection can be categorized into two types. One type of research works focuses on creating benchmark datasets for toxic content detection, either by crowdsourcing and annotating human-written text (Ye, Le, and Lee 2023;Sap et al 2019;Vidgen et al 2020), or leveraging ML-based approaches to generate high-quality toxic dataset in a scalable way (Hartvigsen et al 2022). Another type of works proposes novel approaches to fine-tune LMs on toxic dataset.…”
Section: Related Work Toxic Content Detectionmentioning
confidence: 99%
“…Prior works on toxic content detection can be categorized into two types. One type of research works focuses on creating benchmark datasets for toxic content detection, either by crowdsourcing and annotating human-written text (Ye, Le, and Lee 2023;Sap et al 2019;Vidgen et al 2020), or leveraging ML-based approaches to generate high-quality toxic dataset in a scalable way (Hartvigsen et al 2022). Another type of works proposes novel approaches to fine-tune LMs on toxic dataset.…”
Section: Related Work Toxic Content Detectionmentioning
confidence: 99%