2021
DOI: 10.1016/j.ipm.2021.102643
|View full text |Cite
|
Sign up to set email alerts
|

Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
29
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 72 publications
(39 citation statements)
references
References 39 publications
0
29
0
1
Order By: Relevance
“…The inter-annotator agreement plays a vital role in creating the datasets for hate speech as it affects the performance of a ML algorithm (Kocoń et al 2021 ). In context of fake news and hate speech, Twitter is the preferred social media platform for extracting information and preparing a dataset.…”
Section: Datasetsmentioning
confidence: 99%
“…The inter-annotator agreement plays a vital role in creating the datasets for hate speech as it affects the performance of a ML algorithm (Kocoń et al 2021 ). In context of fake news and hate speech, Twitter is the preferred social media platform for extracting information and preparing a dataset.…”
Section: Datasetsmentioning
confidence: 99%
“…Context dependency of whether an utterance is "toxic" The views about what constitutes unacceptable "toxic speech" differ between individuals and social groups (Kocoń et al, 2021). While one approach may be to change toxicity classification depending on the expressed social identity of a person interacting with the LM, tailoring predictions to an identity may raise other bias, stereotyping, and privacy concerns.…”
Section: Additional Considerationsmentioning
confidence: 99%
“…First, setting such performance thresholds in a clear and accountable way requires participatory input from a broad community of stakeholders, which must be structured and facilitated. Second, views on what level of performance is needed are likely to diverge -for example, people hold different views of what constitutes unacceptable "toxic speech" (Kocoń et al, 2021). This raises political questions about how best to arbitrate conflicting perspectives (Gabriel, 2020a), and knock-on questions such as who constitutes the appropriate reference group in relation to a particular application or product.…”
Section: Benchmarking: When Is a Model "Fair Enough"?mentioning
confidence: 99%
“…(2) Creating identity-based pools on pre-existing datasets that looks for differences based on markers like age, gender, ESL, education e.g. on the Wikipedia Detox Dataset [22]. (3) Creating small, expert-based pools that perform annotations based on certain markers e.g.…”
Section: Related Workmentioning
confidence: 99%
“…This way, those who are likely to be targeted, and who would be best equipped to label the data, would be the ones to determine the ground truth for models that classify toxicity online. This paper continues to build upon research in this space of creating groups of annotators based on some differentiating factor(s) [3,16,22,41]. More specifically, we explore how raters from two relevant identity groups, African American and LGBTQ, label data that represents those identities, and whether their ratings vary from those provided by a randomly selected pool of raters who do not self-identify with these identity groups.…”
Section: Introductionmentioning
confidence: 99%