2023
DOI: 10.1145/3575860
|View full text |Cite
|
Sign up to set email alerts
|

Tamil Offensive Language Detection: Supervised versus Unsupervised Learning Approaches

Abstract: Studies on Natural Language Processing are mainly conducted in English, with very few exploring languages that are under-resourced, including the Dravidian languages. We present a novel work in detecting offensive language using a corpus collected from YouTube containing comments in Tamil. The study specifically aims to compare two machine learning approaches, namely, supervised, and unsupervised to detect offensive patterns in textual communications. In the first setup, offensive language detection models wer… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 41 publications
0
2
0
Order By: Relevance
“…Furthermore, the ethical implications of deploying automated systems for hate speech detection cannot be overlooked. Concerns about privacy, freedom of speech, and the potential for over-surveillance are paramount [29]. The risk of false positives-where benign content is mistakenly classified as hate speech-poses a threat to free expression and could result in unwarranted censorship [30].…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the ethical implications of deploying automated systems for hate speech detection cannot be overlooked. Concerns about privacy, freedom of speech, and the potential for over-surveillance are paramount [29]. The risk of false positives-where benign content is mistakenly classified as hate speech-poses a threat to free expression and could result in unwarranted censorship [30].…”
Section: Discussionmentioning
confidence: 99%
“…A feature shared by the above studies is the focus on addressing cyberbullying challenges associated with high-resource languages. There is an increasing interest in the detection of offensive content and hate speech in low-resource languages such as Tamil [34,35], Pashto [36], Urdu [37], Persian [38]. Similar studies have been focused on improving resources for tackling offensive and hateful content detection [39,40,41,42].…”
Section: Offensive Content Detectionmentioning
confidence: 99%