2017
DOI: 10.1609/icwsm.v11i1.14955
|View full text |Cite
|
Sign up to set email alerts
|

Automated Hate Speech Detection and the Problem of Offensive Language

Abstract: A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
338
0
14

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,435 publications
(533 citation statements)
references
References 9 publications
0
338
0
14
Order By: Relevance
“…The Davidson corpus (Davidson et al, 2017) is a tweet corpus annotated in terms of hate speech, offensive speech or neither. The corpus contains 24,802 tweets: 76% are offensive, 11.4% are hateful, and 16.6% are neither.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The Davidson corpus (Davidson et al, 2017) is a tweet corpus annotated in terms of hate speech, offensive speech or neither. The corpus contains 24,802 tweets: 76% are offensive, 11.4% are hateful, and 16.6% are neither.…”
Section: Methodsmentioning
confidence: 99%
“…Waseem and Hovy (2016) employed character-level features with logistic regression to classify tweets. Davidson et al (2017) classified tweets using word-level features, partof-speech, sentiment and meta-data of tweets with a logistic regression classifier. Other hard-coded features have been used for hate speech detection, such as user features (Fehn Unsvåg and Gambäck, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…Most of hate speech and offensive language corpora are proposed for the English language [2,15,[44][45][46][47]. For the French language, a corpus of Facebook and Twitter annotated data for Islamophobia, sexism, homophobia, religion intolerance and disability detection was also proposed [48,49].…”
Section: Word Generalizationmentioning
confidence: 99%
“…The data sets used in this research are grouped into multiclass ( [6]) and binary classifications ( [12]).…”
Section: Comparative Analysismentioning
confidence: 99%