2021
DOI: 10.48550/arxiv.2109.00591
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech

Abstract: Automatic hate speech detection is hampered by the scarcity of labeled datasetd, leading to poor generalization. We employ pretrained language models (LMs) to alleviate this data bottleneck. We utilize the GPT LM for generating large amounts of synthetic hate speech sequences from available labeled examples, and leverage the generated data in fine-tuning large pretrained LMs on hate detection. An empirical study using the models of BERT, RoBERTa and ALBERT, shows that this approach improves generalization sign… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 5 publications
0
1
0
Order By: Relevance
“…Recently, Transformer-based architectures (Mozafari et al, 2019;Aluru et al, 2020;Samghabadi et al, 2020;Salminen et al, 2020;Qian et al, 2021;Kennedy et al, 2020;Arviv et al, 2021) achieved significant improvements over RNN and CNN models (Zhang et al, 2016;Gambäck and Sikdar, 2017;Del Vigna12 et al, 2017;Park and Fung, 2017). In an effort to mitigate the need for extensive annotation some works use transformers to generate more samples, e.g., (Vidgen et al, 2020b;Wullach et al, 2020Wullach et al, , 2021. Zhou et al (2021) integrate features from external resources to support the model performance.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, Transformer-based architectures (Mozafari et al, 2019;Aluru et al, 2020;Samghabadi et al, 2020;Salminen et al, 2020;Qian et al, 2021;Kennedy et al, 2020;Arviv et al, 2021) achieved significant improvements over RNN and CNN models (Zhang et al, 2016;Gambäck and Sikdar, 2017;Del Vigna12 et al, 2017;Park and Fung, 2017). In an effort to mitigate the need for extensive annotation some works use transformers to generate more samples, e.g., (Vidgen et al, 2020b;Wullach et al, 2020Wullach et al, , 2021. Zhou et al (2021) integrate features from external resources to support the model performance.…”
Section: Related Workmentioning
confidence: 99%