Proceedings of the 2nd Workshop on Abusive Language Online (ALW2) 2018
DOI: 10.18653/v1/w18-5119
|View full text |Cite
|
Sign up to set email alerts
|

Decipherment for Adversarial Offensive Language Detection

Abstract: Automated filters are commonly used by online services to stop users from sending ageinappropriate, bullying messages, or asking others to expose personal information. Previous work has focused on rules or classifiers to detect and filter offensive messages, but these are vulnerable to cleverly disguised plaintext and unseen expressions especially in an adversarial setting where the users can repeatedly try to bypass the filter. In this paper, we model the disguised messages as if they are produced by encrypti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 19 publications
0
4
0
Order By: Relevance
“…Toxic content detection attempts to identify content that can offend or harm its recipients, including hate speech (Wang 2018), racism (Waseem and Hovy 2016), and offensive language (Wu, Kambhatla, and Sarkar 2018). Given the subjectivity of these categorizations, we do not limit the scope of our work to any specific type and address toxic content in general.…”
Section: Task and Datasetsmentioning
confidence: 99%
“…Toxic content detection attempts to identify content that can offend or harm its recipients, including hate speech (Wang 2018), racism (Waseem and Hovy 2016), and offensive language (Wu, Kambhatla, and Sarkar 2018). Given the subjectivity of these categorizations, we do not limit the scope of our work to any specific type and address toxic content in general.…”
Section: Task and Datasetsmentioning
confidence: 99%
“…Computational decipherment based techniques have seen a wide range of applications ranging such as identifying unknown languages and scripts (Hauer and Kondrak, 2016), writing systems (Born et al, 2019(Born et al, , 2021 and lost languages (Snyder et al, 2010;Luo et al, 2019), offensive langauge detection (Wu et al, 2018;Qian et al, 2019), and, more recently, towards improving neural machine translation (Kambhatla et al, 2022). While decipherment has strong connections to cryptography research, we limit the scope of this work to natural language based decipherment.…”
Section: Other Related Workmentioning
confidence: 99%
“…The former relates to purposefully misspelling or otherwise symbolically replacing text (e.g., fvk you, @ssh*l3) to subvert algorithms (Eger et al, 2019;Kurita et al, 2019). Wu et al (2018) show such attacks on toxic content can be effectively deciphered. Word-level attacks are arguably straight-forward for humans, but significantly more challenging to automate-requiring preservation of toxicity; i.e, the semantics of the sentence.…”
Section: Related Workmentioning
confidence: 99%