2021 IEEE Symposium on Security and Privacy (SP) 2021
DOI: 10.1109/sp40001.2021.00028
|View full text |Cite
|
Sign up to set email alerts
|

SoK: Hate, Harassment, and the Changing Landscape of Online Abuse

Abstract: In this paper, we explore the feasibility of leveraging large language models (LLMs) to automate or otherwise assist human raters with identifying harmful content including hate speech, harassment, violent extremism, and election misinformation. Using a dataset of 50,000 comments, we demonstrate that LLMs can achieve 90% accuracy when compared to human verdicts. We explore how to best leverage these capabilities, proposing five design patterns that integrate LLMs with human rating, such as pre-filtering non-vi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
30
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 82 publications
(31 citation statements)
references
References 121 publications
1
30
0
Order By: Relevance
“…We use the term toxic content as an umbrella for identitybased attacks such as anti-Semitism or racism posted publicly to social media [19,37,55], bullying in online gaming or replies to posts [35,50], trolling [8], threats of violence, sexual harassment, and more [47,52]. These attacks represent just a subset of abuse stemming from hate and harassment, a much broader threat that encompasses any activity where an attacker attempts to inflict emotional harm on a target (e.g., stalking, doxxing, sextortion, and intimate partner violence) [9,52]. Unlike spam, phishing, or related abuse classification problems that can rely on expert raters, toxic content is an inherently subjective problem as we show in our work.…”
Section: What Is Toxic Content?mentioning
confidence: 99%
See 2 more Smart Citations
“…We use the term toxic content as an umbrella for identitybased attacks such as anti-Semitism or racism posted publicly to social media [19,37,55], bullying in online gaming or replies to posts [35,50], trolling [8], threats of violence, sexual harassment, and more [47,52]. These attacks represent just a subset of abuse stemming from hate and harassment, a much broader threat that encompasses any activity where an attacker attempts to inflict emotional harm on a target (e.g., stalking, doxxing, sextortion, and intimate partner violence) [9,52]. Unlike spam, phishing, or related abuse classification problems that can rely on expert raters, toxic content is an inherently subjective problem as we show in our work.…”
Section: What Is Toxic Content?mentioning
confidence: 99%
“…Online hate and harassment is a pernicious threat facing 48% of Internet users [52]. In response to this growing challenge, online platforms have developed automated tools to take action against toxic content (e.g., hate speech, threats, identity attacks).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Technology-enabled IPV is a troubling phenomenon that fits into the broader ecosystem of online hate, harassment, and abuse [49]. Its manifestations include many forms of harassment [6,23,39], character assassination through faked revenge porn [43], impersonation attacks that damage the targeted individual's relationships [25,40], and above all, spying on an intimate partner through stalkerware and other means, such as knowledge of the survivor's account credentials [24,25].…”
Section: Background and Related Workmentioning
confidence: 99%
“…In the U.S., 15.8% of women and 5.3% of men reported being subjected to stalking violence "in which they felt very fearful or believed that they or someone close to them would be harmed or killed" [8]. IPV survivors have shed light on the many ways in which technology plays a role in inter-personal attacks [8,20,40,43,49], of which tech-enabled stalking and spying by current or former romantic partners are especially common and pernicious [24,34]. In a recent survey, 10% of the U.S. adult respondents admitted to using a mobile phone app to spy on an intimate partner [50].…”
Section: Introductionmentioning
confidence: 99%