2018
DOI: 10.31234/osf.io/hqjxn
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Introducing the Gab Hate Corpus: Defining and applying hate-based rhetoric to social media posts at scale

Abstract: The growing prominence of online hate speech is a threat to a safe and just society. This endangering phenomenon requires collaboration across the sciences in order to generate evidence-based knowledge of, and policies for, the dissemination of hatred in online spaces. To foster such collaborations, here we present the Gab Hate Corpus (GHC), consisting of 27,665 posts from the social network service gab.ai, each annotated by a minimum of three trained annotators. Annotators were trained to label posts accordin… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
44
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 38 publications
(44 citation statements)
references
References 34 publications
0
44
0
Order By: Relevance
“…Lastly, especially in the purely predictive setting, these feature-based methods are best used as baselines for more sophisticated models. For example, Kennedy, Atari, Mostafazadeh Davani, Yeh, et al, (2020) compared a feature-based model (using TF-IDF vectors) to a leading method in NLP for predicting whether social media posts contained hate-based rhetoric. Table 4 .…”
Section: Methods For Feature-based Supervisionmentioning
confidence: 99%
“…Lastly, especially in the purely predictive setting, these feature-based methods are best used as baselines for more sophisticated models. For example, Kennedy, Atari, Mostafazadeh Davani, Yeh, et al, (2020) compared a feature-based model (using TF-IDF vectors) to a leading method in NLP for predicting whether social media posts contained hate-based rhetoric. Table 4 .…”
Section: Methods For Feature-based Supervisionmentioning
confidence: 99%
“…Given this, a precision score of 0.75 (one of four messages flagged as hate speech will be done so incorrectly), represents a reasonable level of performance, and favourable in relation to comparable models (e.g. Davidson et al, 2017;Kennedy et al, 2020).…”
Section: Automatic Detection Of Hate Speech In Social Media Postsmentioning
confidence: 97%
“…Prediction Task Bias GHC (Kennedy et al, 2018) Hate Group Identifier Stormfront (de Gibert et al, 2018) Hate Group Identifier DWMW (Davidson et al, 2017) Toxicity AAVE Dialect FDCL (Founta et al, 2018) Toxicity AAVE Dialect BiasBios (De-Arteaga et al, 2019) Occupation Gender Stereotyping OntoNotes 5.0 (Weischedel et al, 2013) Coreference Gender Stereotyping fore. This is a key test of UBM's viability for widespread application.…”
Section: Datasetmentioning
confidence: 99%
“…This bias refers to higher false positive rates of hate speech predictions for sentences containing specific group identifiers, which is harmful to protected groups by misclassifying innocuous text (e.g., "I am a Muslim") as hate speech. We include two datasets for study, namely the Gab Hate Corpus (GHC; Kennedy et al, 2018) and the Stormfront corpus (de Gibert et al, 2018). Both datasets contain binary labels for hate and non-hate instances, though with differences in the labeling schemas and domains.…”
Section: Bias Factors and Datasetsmentioning
confidence: 99%
See 1 more Smart Citation