Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2021
DOI: 10.18653/v1/2021.naacl-demos.17
|View full text |Cite
|
Sign up to set email alerts
|

MUDES: Multilingual Detection of Offensive Spans

Abstract: The interest in offensive content identification in social media has grown substantially in recent years. Previous work has dealt mostly with post level annotations. However, identifying offensive spans is useful in many ways. To help coping with this important challenge, we present MUDES, a multilingual system to detect offensive spans in texts. MUDES features pre-trained models, a Python API for developers, and a user-friendly web-based interface. A detailed description of MUDES' components is presented in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
2

Relationship

5
5

Authors

Journals

citations
Cited by 27 publications
(15 citation statements)
references
References 38 publications
0
15
0
Order By: Relevance
“…Transformer models have been used successfully for various NLP tasks [13] such as text classification [18,37], NER [39,47], context similarity [17], language identification [19] etc. Most of the tasks were focused on English language due to the fact the most of the pre-trained transformer models were trained on English data.…”
Section: Methodsmentioning
confidence: 99%
“…Transformer models have been used successfully for various NLP tasks [13] such as text classification [18,37], NER [39,47], context similarity [17], language identification [19] etc. Most of the tasks were focused on English language due to the fact the most of the pre-trained transformer models were trained on English data.…”
Section: Methodsmentioning
confidence: 99%
“…Transformer models have been used successfully for various NLP tasks [13] such as text classification [18,37], NER [39,47], context similarity [17], language identification [19], and so on. Most of the tasks were focused on English language due to the fact that most of the pre-trained transformer models were trained on English data.…”
Section: Methodsmentioning
confidence: 99%
“…This multilingual scenario creates the need for developing more technology for local languages. This includes offensive language identification systems, which are often modeled as a supervised classification problem relying on large amounts of annotated data [14]. In this paper, we investigate strategies such as zero-shot learning and multilingual learning to circumvent data scarcity in six languages from the two most widely spoken language families in India, namely, Bengali, Hindi, and Urdu, i.e., three Indo-Aryan languages, and Kannada, Malaylam, and Tamil-the three Dravidian languages.…”
Section: Motivationmentioning
confidence: 99%