Proceedings of the Fourth Arabic Natural Language Processing Workshop 2019
DOI: 10.18653/v1/w19-4622
|View full text |Cite
|
Sign up to set email alerts
|

The MADAR Shared Task on Arabic Fine-Grained Dialect Identification

Abstract: In this paper, we present the results and findings of the MADAR Shared Task on Arabic Fine-Grained Dialect Identification. This shared task was organized as part of The Fourth Arabic Natural Language Processing Workshop, collocated with ACL 2019. The shared task includes two subtasks: the MADAR Travel Domain Dialect Identification subtask (Subtask 1) and the MADAR Twitter User Dialect Identification subtask (Subtask 2). This shared task is the first to target a large set of dialect labels at the city and count… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
87
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 72 publications
(87 citation statements)
references
References 24 publications
0
87
0
Order By: Relevance
“…In Arabic, it is prerequisite for most NLP tasks, where many subsequent tasks depend on it. We can find spoken dialect identification work in Biadsy et al MADAR shared task (Bouamor et al, 2019) consists of two sub-tasks which are…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In Arabic, it is prerequisite for most NLP tasks, where many subsequent tasks depend on it. We can find spoken dialect identification work in Biadsy et al MADAR shared task (Bouamor et al, 2019) consists of two sub-tasks which are…”
Section: Related Workmentioning
confidence: 99%
“…These native languages or dialects can be categorized based on their common linguistic features and geographical locations . This categorization is described in detail in Bouamor et al (2019). In the technological expansion of communication era, automatic identification of these dialects becomes an essential task for major natural language applications.…”
Section: Introductionmentioning
confidence: 99%
“…The corpus covers the dialects of 25 Arab cities and the MSA. It is the same data set described in Bouamor et al (2019) and . This corpus is composed of 2000 sentences translated to each dialect, with a total of 52000 sentences.…”
Section: Datamentioning
confidence: 99%
“…In this work, we used the MADAR Travel Domain dataset built by translating the Basic Traveling Expression Corpus (BTEC) (Takezawa et al, 2007). The whole sentences have been translated manually from English and French to the different Arabic dialects by speakers of 25 dialects Bouamor et al, 2019). The training data is composed of 1600 sentences for each of the 25 dialects in addition to MSA.…”
Section: Datasetmentioning
confidence: 99%