Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.559
|View full text |Cite
|
Sign up to set email alerts
|

XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages

Abstract: We present XHATE-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHATE-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain-and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHATE-999 as a co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
69
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 41 publications
(70 citation statements)
references
References 56 publications
(58 reference statements)
1
69
0
Order By: Relevance
“…In Swamy, Jamatia & Gambäck (2019) ’s study with fine-tuned BERT models ( Devlin et al, 2019 ), Founta and OLID produced models that performed well on each other. The source of such differences are usually traced back to search terms ( Swamy, Jamatia & Gambäck, 2019 ), topics covered ( Nejadgholi & Kiritchenko, 2020 ; Pamungkas, Basile & Patti, 2020 ), label definitions ( Pamungkas & Patti, 2019 ; Pamungkas, Basile & Patti, 2020 ; Fortuna, Soler-Company & Wanner, 2021 ), and data source platforms ( Glavaš, Karan & Vulić, 2020 ; Karan & Šnajder, 2018 ).…”
Section: Generalisation Studies In Hate Speech Detectionmentioning
confidence: 99%
See 4 more Smart Citations
“…In Swamy, Jamatia & Gambäck (2019) ’s study with fine-tuned BERT models ( Devlin et al, 2019 ), Founta and OLID produced models that performed well on each other. The source of such differences are usually traced back to search terms ( Swamy, Jamatia & Gambäck, 2019 ), topics covered ( Nejadgholi & Kiritchenko, 2020 ; Pamungkas, Basile & Patti, 2020 ), label definitions ( Pamungkas & Patti, 2019 ; Pamungkas, Basile & Patti, 2020 ; Fortuna, Soler-Company & Wanner, 2021 ), and data source platforms ( Glavaš, Karan & Vulić, 2020 ; Karan & Šnajder, 2018 ).…”
Section: Generalisation Studies In Hate Speech Detectionmentioning
confidence: 99%
“…The ideal case would be to be able to use data in one language for training and apply the model on data in another language, which would help address the challenge in low-resource languages. In a few studies ( Pamungkas, Basile & Patti, 2020 ; Glavaš, Karan & Vulić, 2020 ; Arango, Prez & Poblete, 2020 ; Fortuna, Soler-Company & Wanner, 2021 ), language was included as a separate variable, alongside a “domain” variable independent to it, which is characterised by the source platform or the data collection method. These cross-lingual experiments are summarised in Table 3 .…”
Section: Generalisation Studies In Hate Speech Detectionmentioning
confidence: 99%
See 3 more Smart Citations