Language identification systems have been the subject of extensive research efforts since the 1970's. Dialect identification is a natural extension of such systems. Most of the current systems do not consider the Arabic language and its dialects despite its immense importance. This paper addresses the problem of Arabic dialects identification. To the best of our knowledge, this is the first attempt to address this specific problem. Due to the lack of existing datasets, we collect our own dataset. To avoid collecting a prohibitively large dataset for the dozens of existing Arabic dialects, we focus on two specific dialects, viz. the Jordanian dialect and the Egyptian dialect. Several audio processing techniques (for feature extraction) and classifiers are tested to determine which combination of features/classifier generates the best results. Surprisingly, the results are very good.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.