Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing 2014
DOI: 10.3115/v1/w14-5813
|View full text |Cite
|
Sign up to set email alerts
|

Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application – the case of Tunisian Arabic and the Social Media

Abstract: Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for the translation of texts of social media. More precisely, this paper focuses on the Tunisian Dialect of Arabic (TAD) with an application on automatic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
11
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(11 citation statements)
references
References 10 publications
0
11
0
Order By: Relevance
“…(deep morphological corpus of 1500 sentence pairs Tunisian-to-MSA 84% representation of data) Dev/test set 750 sentence pairs, MSA-to-Tunisian 80% (Sadat et al, 2014) Rule-based approach 50 sentences BLEU score: 14.32 +Bilingual lexicon+LM (Tachicart and Bouzoubaa, 2014) Rule-based approach -+Bilingual lexicon+LM (Meftouh et al, 2015) Statistical approach 6 sides parallel corpus A set of BLEU scores of 6400 sentences Dev/test set 500 sentence for each corpus…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…(deep morphological corpus of 1500 sentence pairs Tunisian-to-MSA 84% representation of data) Dev/test set 750 sentence pairs, MSA-to-Tunisian 80% (Sadat et al, 2014) Rule-based approach 50 sentences BLEU score: 14.32 +Bilingual lexicon+LM (Tachicart and Bouzoubaa, 2014) Rule-based approach -+Bilingual lexicon+LM (Meftouh et al, 2015) Statistical approach 6 sides parallel corpus A set of BLEU scores of 6400 sentences Dev/test set 500 sentence for each corpus…”
Section: Discussionmentioning
confidence: 99%
“…The best results of translation were achieved between the dialects of Algeria which is not a surprising result since they share a large part of the vocabulary. It was also shown that the (Bakr et al, 2008) Egyptian MSA (Salloum and Habash, 2012) Levantine, Egyptian, MSA Iraqi, Gulf Arabic (Mohamed et al, 2012) MSA Egyptian (Al-Gaphari and Al-Yadoumi, 2012) Sanaani (Yemenite) MSA (Hamdi et al, 2013) Tunisian MSA MSA Tunisian (Tachicart and Bouzoubaa, 2014) Moroccan MSA (Sadat et al, 2014) Tunisian MSA (Meftouh et al, 2015) Algerian, Tunisian, MSA Syrian and Palestinian MSA Algerian, Tunisian, Syrian and Palestinian performance of machine translation between Palestinian and Syrian was relatively high because of the closeness of the two dialects. Concerning MSA, the best results of machine translation have been achieved with Palestinian dialect.…”
Section: Translating Between Msa and Arabic Dialectsmentioning
confidence: 99%
See 2 more Smart Citations
“…The described system works with Levantine (Jordanian, Syrian and Palestinian), Egyptian, Iraqi and Gulf Arabic dialects. Sadat [9] presented a model for the translation of the Tunisian Arabic Dialect to the standardized modern form of Arabic. The model is based on a bilingual lexicon which was designed for the particular context of the translation exercise and uses a set of grammatical mapping rules with an additional step for disambiguation.…”
Section: Related Workmentioning
confidence: 99%