2020
DOI: 10.48550/arxiv.2012.09686
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hate Speech detection in the Bengali language: A dataset and its baseline evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…Nevertheless, with the advances in multilingual parsers and deep learning technology, together with increasing pressures from policy-makers to handle hate speech issues at local resources, non-English HS detection toolkits have seen a steady increase. The figure indicates that about 51% of all works in this field are performed on English dataset, with an increase of proportion of other languages as well where Arabic (13% ) [93,59,12,143], Turkish (6%) [143,104], Greek (4%) [143,6,136], Danish (5%) [106,143], Hindi (4%) [121,22,88], German (4% ) [72,120], Malayalam (3%) [130,109], Tamil (3%) [130,20], Chinese (1%) [138,139,155], Italian (2%) [116], Urdu (1%) [126,95,7], Russian(1%) [17], Bengali (1% ) [62,127,69], Korean (1%) [91], French (1%) [16,102,50], Indonesian (1%) [14], Portuguese (1%) [14], Spanish (1%) [56] and Polish (1%) [118] seem to dominate the rest of the languages in this field.…”
Section: Statistical Trends Of Resultsmentioning
confidence: 99%
“…Nevertheless, with the advances in multilingual parsers and deep learning technology, together with increasing pressures from policy-makers to handle hate speech issues at local resources, non-English HS detection toolkits have seen a steady increase. The figure indicates that about 51% of all works in this field are performed on English dataset, with an increase of proportion of other languages as well where Arabic (13% ) [93,59,12,143], Turkish (6%) [143,104], Greek (4%) [143,6,136], Danish (5%) [106,143], Hindi (4%) [121,22,88], German (4% ) [72,120], Malayalam (3%) [130,109], Tamil (3%) [130,20], Chinese (1%) [138,139,155], Italian (2%) [116], Urdu (1%) [126,95,7], Russian(1%) [17], Bengali (1% ) [62,127,69], Korean (1%) [91], French (1%) [16,102,50], Indonesian (1%) [14], Portuguese (1%) [14], Spanish (1%) [56] and Polish (1%) [118] seem to dominate the rest of the languages in this field.…”
Section: Statistical Trends Of Resultsmentioning
confidence: 99%
“…As for the Bengali dataset, we reuse the one from Romim et al [19]. The authors extracted comments from Facebook pages and Youtube videos whose topics range from celebrity and sports to crime and politics.…”
Section: Datamentioning
confidence: 99%
“…Accurate identification of hate speech in Bengali is a challenging task. Only a few restrictive approaches [24,25,2] have been proposed so far. Romim et al [24] prepared a dataset of 30K comments, making it one of the largest datasets for identifying offensive and hateful statements.…”
Section: Related Workmentioning
confidence: 99%
“…Only a few restrictive approaches [24,25,2] have been proposed so far. Romim et al [24] prepared a dataset of 30K comments, making it one of the largest datasets for identifying offensive and hateful statements. However, this dataset has several issues.…”
Section: Related Workmentioning
confidence: 99%