2017
DOI: 10.1145/3032963
|View full text |Cite
|
Sign up to set email alerts
|

On Obstructing Obscenity Obfuscation

Abstract: Obscenity (the use of rude words or offensive expressions) has spread from informal verbal conversations to digital media, becoming increasingly common on user-generated comments found in Web forums, newspaper user boards, social networks, blogs, and media-sharing sites. The basic obscenity-blocking mechanism is based on verbatim comparisons against a blacklist of banned vocabulary; however, creative users circumvent these filters by obfuscating obscenity with symbol substitutions or bogus segmentations that s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
22
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 21 publications
(22 citation statements)
references
References 43 publications
0
22
0
Order By: Relevance
“…The design of rules is often done in two steps. Dictionaries expressing OCL are prepared [11,224,296] (e.g., subjectivity lexicon, hate lexicon, and list of haterepresentative grammatical relations [83], lists of profane words augmented with genomicsinspired techniques [225]). Then, lists of rules are defined to attribute a score to samples based on their use of the dictionary vocabulary, often using pattern or word matching.…”
Section: A6 Summary Of Common Classification Algorithmsmentioning
confidence: 99%
“…The design of rules is often done in two steps. Dictionaries expressing OCL are prepared [11,224,296] (e.g., subjectivity lexicon, hate lexicon, and list of haterepresentative grammatical relations [83], lists of profane words augmented with genomicsinspired techniques [225]). Then, lists of rules are defined to attribute a score to samples based on their use of the dictionary vocabulary, often using pattern or word matching.…”
Section: A6 Summary Of Common Classification Algorithmsmentioning
confidence: 99%
“…Such an approach yields good results for de-obfuscation, but it is computationally expensive and requires a dataset of obfuscated words for training. More recently, Rojas et al [15] describe a more compact approach based on a dynamic programming sequence alignment algorithm. It has a different set of limitations, the main one being that it does not allow for one character to be used as an obfuscated version of several distinct original characters (it uses a one-to-one character mapping).…”
Section: A Abuse Detectionmentioning
confidence: 99%
“…This is due to the increasing plurality of online contexts and the fact that various obfuscation tactics have become common among users' repertoire of actions, next to the most studied practices of visibility. The assimilation-by protest movements, extremist groups but also by average users-of methods to escape censorship and being banned (Rojas-Galeano, 2017), along with the widespread use of VPNs, the algorithmic opacity of companies and institutions, and the use of memetic idiolects as forms of cultural cryptography, indicate that online social action takes increasingly place below the radar of research.…”
Section: Introductionmentioning
confidence: 99%