EVALITA Evaluation of NLP and Speech Tools for Italian 2018
DOI: 10.4000/books.aaccademia.4752
|View full text |Cite
|
Sign up to set email alerts
|

Merging datasets for hate speech classification in Italian

Abstract: This paper presents an approach to the shared task HaSpeeDe within Evalita 2018. We followed a standard machine learning procedure with training, validation, and testing phases. We considered word embedding as features and deep learning for classification. We tested the effect of merging two datasets in the classification of messages from Facebook and Twitter. We concluded that using data for training and testing from the same social network was a requirement to achieve a good performance. Moreover, adding dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(9 citation statements)
references
References 14 publications
0
9
0
Order By: Relevance
“…The task consists in automatically annotating messages from Twitter and Facebook, with a boolean value indicating the presence (or not) of hate speech. Similar to Germeval 2018 submissions, also in this case the participating systems adopt a wide range of approaches, including bi-LSTM [39], SVM [53], ensemble classifiers [52,4], RNN [28], CNN and GRU [60]. The authors of the best-performing system, ItaliaNLP [17], experiment with three different classification models: one based on linear SVM, another one based on a 1-layer BiLSTM and a newly-introduced one based on a 2-layer BiLSTM which exploits multi-task learning with additional data from the 2016 SENTIPOLC task 4 .…”
Section: Hate Speech Detection On Languages Different From Englishmentioning
confidence: 99%
“…The task consists in automatically annotating messages from Twitter and Facebook, with a boolean value indicating the presence (or not) of hate speech. Similar to Germeval 2018 submissions, also in this case the participating systems adopt a wide range of approaches, including bi-LSTM [39], SVM [53], ensemble classifiers [52,4], RNN [28], CNN and GRU [60]. The authors of the best-performing system, ItaliaNLP [17], experiment with three different classification models: one based on linear SVM, another one based on a 1-layer BiLSTM and a newly-introduced one based on a 2-layer BiLSTM which exploits multi-task learning with additional data from the 2016 SENTIPOLC task 4 .…”
Section: Hate Speech Detection On Languages Different From Englishmentioning
confidence: 99%
“…StopPropagHate (Fortuna et al, 2018) The authors use a classifier based on Recurrent Neural Networks with a binary cross-entropy as loss function. In their system, each input word is represented by a 10000-dimensional vector which is a one-hot encoding vector.…”
Section: Grcp (De La Peñamentioning
confidence: 99%
“…Some academic work and several competitions have proposed some tasks to promote studies and advances in the area. Much of this work and data sets focus on English (Fortuna and Nunes, 2018) only, even though this is a widespread phenomenon that happens in any language.…”
Section: Introductionmentioning
confidence: 99%