Proceedings of the 25th International Conference on World Wide Web 2016
DOI: 10.1145/2872427.2883062
|View full text |Cite
|
Sign up to set email alerts
|

Abusive Language Detection in Online User Content

Abstract: Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-ofthe-art deep learning approach. We also develop a cor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

13
781
0
10

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 853 publications
(804 citation statements)
references
References 18 publications
13
781
0
10
Order By: Relevance
“…The top performing baseline and current state-of-the-art, Nobata et al (2016), which consists of a comprehensive combination of a range of different features, is bested by NBSVM using solely character n-grams (77 F- For both NBSVM and RNNLM methods, character n-grams outperform their token counterparts (7 and 3 points F-1 score respectively). As most prior work has made use of blacklists and word ngrams, this proves to be an effective method for improving performance.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…The top performing baseline and current state-of-the-art, Nobata et al (2016), which consists of a comprehensive combination of a range of different features, is bested by NBSVM using solely character n-grams (77 F- For both NBSVM and RNNLM methods, character n-grams outperform their token counterparts (7 and 3 points F-1 score respectively). As most prior work has made use of blacklists and word ngrams, this proves to be an effective method for improving performance.…”
Section: Resultsmentioning
confidence: 99%
“…The top performing baseline and current state-of-the-art, Nobata et al (2016), which consists of a comprehensive combination of a range of different features, is bested by NBSVM using solely character n-grams (77 F- Table 1) outperforms all other methods in all measures save for recall. This shows that by increasing the number of relevant features we can improve precision with just a small loss in recall.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations