Proceedings of the Fourth Arabic Natural Language Processing Workshop 2019
DOI: 10.18653/v1/w19-4633
|View full text |Cite
|
Sign up to set email alerts
|

The SMarT Classifier for Arabic Fine-Grained Dialect Identification

Abstract: This paper describes the approach adopted by the SMarT research group to build a dialect identification system in the framework of the Madar shared task on Arabic fine-grained dialect identification. We experimented several approaches, but we finally decided to use a Multinomial Naïve Bayes classifier based on word and character ngrams in addition to the language model probabilities. We achieved a score of 67.73% in terms of Macro accuracy and a macro-averaged F1-score of 67.31%.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 14 publications
0
8
0
Order By: Relevance
“…The PADIC (Parallel Arabic DIalect Corpus) (Meftouh et al, 2015) multi-dialectal corpus contains six dialects in addition to MSA. Two Algerian dialect corpora were created: Annaba's dialect (a city in Algeria) from daily conversations and the dialect from movies/TV shows in the Algiers dialect.…”
Section: Dialectical Arabic Datasetsmentioning
confidence: 99%
“…The PADIC (Parallel Arabic DIalect Corpus) (Meftouh et al, 2015) multi-dialectal corpus contains six dialects in addition to MSA. Two Algerian dialect corpora were created: Annaba's dialect (a city in Algeria) from daily conversations and the dialect from movies/TV shows in the Algiers dialect.…”
Section: Dialectical Arabic Datasetsmentioning
confidence: 99%
“…This prompted researchers to create new DA datasets, usually targeting a limited number of specific regions or countries (Gadalla et al, 1997;Diab et al, 2010;Al-Sabbagh and Girju, 2012;Sadat et al, 2014;Harrat et al, 2014;Jarrar et al, 2016;Khalifa et al, 2016;Al-Twairesh et al, 2018;Alsarsour et al, 2018;Kwaik et al, 2018;El-Haj, 2020). This was followed by several works that introduced multi-dialectal datasets and models for regionlevel dialect identification (Zaidan and Callison-Burch, 2011;Bouamor et al, 2014;Meftouh et al, 2015). The initial Arabic dialect identification shared tasks were part of the VarDial workshop series, primarily utilizing transcriptions of speech broadcasts (Malmasi et al, 2016).…”
Section: Arabic Dialectsmentioning
confidence: 99%
“…(2) Translation in which participants are asked to translate sentences into their native Arabic dialects (Ho, 2006-;Meftouh et al, 2015;Bouamor et al, 2018;Mubarak, 2018). If all the participants are asked to translate the same source sentences, then the dataset is composed of parallel sentences in various dialects.…”
Section: Dialects Sentencementioning
confidence: 99%
“…MPCA -/ 5 / 3 -2,000 Egyptian Arabic sentences from a pre-existing corpus, manually translated into 4 other country-level dialects in addition to MSA. PADIC (Meftouh et al, 2015) 5 / 4 / 2 -6,400 sentences sampled from the transcripts of recorded conversations and movie/TV shows in Algerian Arabic and manually translated into 4 other dialects and MSA. DIAL2MSA (Mubarak, 2018) -/ -/ 4 -Dialectal tweets manually translated into MSA.…”
Section: Dialects Sentencementioning
confidence: 99%