2022
DOI: 10.36227/techrxiv.19690177.v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hate Speech Recognition in multilingual text: Hinglish Documents

Abstract: In this paper, we apply and evaluate several machine learning and deep learning methods, along with various feature extraction and word-embedding techniques, on a consolidated dataset of 20600 instances, for hate speech detection from tweets and comments in Hinglish. The experimental results reveal that deep learning models perform better than machine learning models in general. Among the deep learning models, the CNN-BiLSTM model with word2vec word embedding provides the best results. The model yields 0.876 a… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 28 publications
0
1
0
Order By: Relevance
“…However, among the more than 7000 languages globally, the majority do not have sufficient training resources like the major languages such as English. According to statistics, about 40% of languages are facing extinction, with a user base of fewer than 1000 users [8][9][10]. It is the scarcity of the transcribed data for these low-resource language communities that prevents large neural networks from being substantially trained, leading to poor performance and a lack of real-world applications.…”
Section: Introductionmentioning
confidence: 99%
“…However, among the more than 7000 languages globally, the majority do not have sufficient training resources like the major languages such as English. According to statistics, about 40% of languages are facing extinction, with a user base of fewer than 1000 users [8][9][10]. It is the scarcity of the transcribed data for these low-resource language communities that prevents large neural networks from being substantially trained, leading to poor performance and a lack of real-world applications.…”
Section: Introductionmentioning
confidence: 99%