Proceedings of the 25th Brazillian Symposium on Multimedia and the Web 2019
DOI: 10.1145/3323503.3360619
|View full text |Cite
|
Sign up to set email alerts
|

Hate speech detection using brazilian imageboards

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 19 publications
0
6
0
2
Order By: Relevance
“…A second key distinction concerns the source from which data are retrieved. The microblogging platform Twitter 11 is by far the most exploited source, due to the relatively reduced length of texts and to a friendly policy on making data publicly available: 32 resources contain tweets, one of which (Olteanu et al 2018) also features posts from the social aggregator Reddit 12 , one (Nascimento et al 2019) also retrieves comments from the 55chan 13 imageboard, while in two works (Bosco et al 2018;Mandl et al 2019 2018use sentences from the well-known white-suprematist forum Stormfront; the dataset released for the Hate Speech Hackathon 15 contains posts from the Wikipedia Topical focus: Abusiveness (5); Aggressiveness (2); Anti-Roma (1); Child sexual abuse (1); Cyberbullying (2); Flames (1); Harassment (1); Homophobia (4); HS (36); Islamophobia (2); Obscenity, Profanity (3); Offensiveness (13); Personal Attacks (1); Racism (6); Sexism, Misogyny (9); Threats, Violence (1); Toxicity (1); White supremacy (1). Nearly all the resources feature user-generated public contents, mostly microblog posts, often retrieved with a keyword-based approach and mostly using words with a negative polarity.…”
Section: Data Sourcementioning
confidence: 99%
See 1 more Smart Citation
“…A second key distinction concerns the source from which data are retrieved. The microblogging platform Twitter 11 is by far the most exploited source, due to the relatively reduced length of texts and to a friendly policy on making data publicly available: 32 resources contain tweets, one of which (Olteanu et al 2018) also features posts from the social aggregator Reddit 12 , one (Nascimento et al 2019) also retrieves comments from the 55chan 13 imageboard, while in two works (Bosco et al 2018;Mandl et al 2019 2018use sentences from the well-known white-suprematist forum Stormfront; the dataset released for the Hate Speech Hackathon 15 contains posts from the Wikipedia Topical focus: Abusiveness (5); Aggressiveness (2); Anti-Roma (1); Child sexual abuse (1); Cyberbullying (2); Flames (1); Harassment (1); Homophobia (4); HS (36); Islamophobia (2); Obscenity, Profanity (3); Offensiveness (13); Personal Attacks (1); Racism (6); Sexism, Misogyny (9); Threats, Violence (1); Toxicity (1); White supremacy (1). Nearly all the resources feature user-generated public contents, mostly microblog posts, often retrieved with a keyword-based approach and mostly using words with a negative polarity.…”
Section: Data Sourcementioning
confidence: 99%
“…In (Basile et al 2019;Fersini et al 2018a) a combined approach has been applied to collect the hateful and misogynous tweets, by monitoring potential victims of hate accounts, downloading the history of identified haters and filtering Twitter streams with keywords. In few other cases (see Nascimento et al (2019)), a sort of a priori classification is attributed to the texts according to the retrieval source, assuming that all the items collected from a given source can be considered hateful. Quite uniquely, Fišer et al 2017use a corpus extracted from an online platform that collects spontaneous reports by the Internet users of any material containing HS or child sexual abuse: the corpus is then checked by experts validation, assessing that more than 40% is not actually disturbing content and that only 3% can be considered illegal content.…”
Section: Data Sourcementioning
confidence: 99%
“…The Brazilian Portuguese version contains 14,459 words labeled in 73 categories. Some recent studies have presented promising results using this dictionary (Carvalho et al, 2019; Nascimento et al, 2019).…”
Section: Theoretical Backgroundmentioning
confidence: 99%
“…A systematic review of hate speech detection was carried out by Poletto et al (2021); the authors pointed out that there exists more datasets for English than other languages, 37 out of 64 are English datasets, for Portuguese, there are only 2 datasets found by those authors. The first dataset, called NCCVG 1 , was published in Nascimento et al (2019); it contains 7,671 entries from two data sources: Twitter and 55chan. The second dataset, called OFFCOMBR 2 , published in de Pelle and Moreira (2017), contains 1,250 comments posted on a news site in Brazil https://g1.globo.com/.…”
Section: Related Workmentioning
confidence: 99%