Mohamed Lichouri scite author profile

In this paper, we conduct a study to evaluate the performance of statistical and neural methods to classify Arabic Dialects (AD). This evaluation is based on two kinds of corpora. The first one is a corpus named PADIC (Parallel Arabic DIalectal Corpus), which is a multi-dialectal corpus composed of six dialects: two Algerian dialects (of Algiers and Annaba cities), Palestinian, Syrian, Tunisian, and Moroccan, in addition to MSA. The second one (AraDial) is a manually collected corpus that contains the same dialects as well as the same number of sentences as PADIC (6412 sentences for each dialect). In our experiments, we used both statistical and neural classifiers, namely: Gaussian Naive Bayes, Bernoulli Naive Bayes, kNN, Logistic Regression, SGD Classifier, Passive Aggressive Classifier, Perceptron, Linear Support Vector, and Convolutional Neural Network Classifiers. We evaluated these classifiers in two setups: training on a parallel corpus (PADIC) and testing on the non-parallel corpus and vice versa. The obtained results have shown that training our system on a non-parallel corpus will give better results, as we achieved a mean score of 92.08\%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mohamed Lichouri

Word-Level vs Sentence-Level Language Identification: Application to Algerian and Arabic Dialects

A Language Identification System Based on Voxforge Speech Corpus

CLIASR: A Combined Automatic Speech Recognition and Language Identification System

Efficient Machine Learning-based Approach for Brain Tumor Detection Using the CAD System

Impact of Parallel vs. Non Parallel Corpora on the Identification of Arabic Dialects

Contact Info

Product

Resources

About