Proceedings of the First International Workshop on Social Media Retrieval and Analysis 2014
DOI: 10.1145/2632188.2632207
|View full text |Cite
|
Sign up to set email alerts
|

Automatic identification of arabic dialects in social media

Abstract: Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(29 citation statements)
references
References 10 publications
0
29
0
Order By: Relevance
“…LI for closely-related languages, language varieties, and dialects has been studied for Malay-Indonesian (Ranaivo-Malançon, 2006), Indian languages (Murthy and Kumar, 2006), South Slavic languages (Ljubešić et al, 2007;Tiedemann and Ljubešić, 2012;Kranjcić, 2014, 2015), Serbo-Croatian dialects (Zecevic and Vujicic-Stankovic, 2013), English varieties (Lui and Cook, 2013;Simaki et al, 2017), Dutch-Flemish (van der Lee and Bosch, 2017), Dutch dialects (including a temporal dimension) (Trieschnigg et al, 2012), German Dialects (Hollenstein and Aepli, 2015) Mainland-Singaporean-Taiwanese Chinese (Huang and Lee, 2008), Portuguese varieties (Zampieri and Gebre, 2012;, Spanish varieties Maier and Gómez-Rodríguez, 2014), French varieties (Mokhov, 2010a,b;Diwersy et al, 2014), languages of the Iberian Peninsula , Romanian dialects (Ciobanu and Dinu, 2016), and Arabic dialects Zaidan and Callison-Burch, 2014;Tillmann et al, 2014;Sadat et al, 2014b;Wray, 2018), the last of which we discuss in more detail in this section. As to off-the-shelf tools which can identify closely-related languages, Zampieri and Gebre (2014) released a LI system trained to identify 27 languages, including 10 language varieties.…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
“…LI for closely-related languages, language varieties, and dialects has been studied for Malay-Indonesian (Ranaivo-Malançon, 2006), Indian languages (Murthy and Kumar, 2006), South Slavic languages (Ljubešić et al, 2007;Tiedemann and Ljubešić, 2012;Kranjcić, 2014, 2015), Serbo-Croatian dialects (Zecevic and Vujicic-Stankovic, 2013), English varieties (Lui and Cook, 2013;Simaki et al, 2017), Dutch-Flemish (van der Lee and Bosch, 2017), Dutch dialects (including a temporal dimension) (Trieschnigg et al, 2012), German Dialects (Hollenstein and Aepli, 2015) Mainland-Singaporean-Taiwanese Chinese (Huang and Lee, 2008), Portuguese varieties (Zampieri and Gebre, 2012;, Spanish varieties Maier and Gómez-Rodríguez, 2014), French varieties (Mokhov, 2010a,b;Diwersy et al, 2014), languages of the Iberian Peninsula , Romanian dialects (Ciobanu and Dinu, 2016), and Arabic dialects Zaidan and Callison-Burch, 2014;Tillmann et al, 2014;Sadat et al, 2014b;Wray, 2018), the last of which we discuss in more detail in this section. As to off-the-shelf tools which can identify closely-related languages, Zampieri and Gebre (2014) released a LI system trained to identify 27 languages, including 10 language varieties.…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
“…The principal goal of sentiment analysis is to classify text as positive, negative or neutral [10]. Sentiment analysis for Arabic language possesses different challenges compared to other languages such as dealing with Modern Standard Language (MSA) and Dialects that significantly varies from one region to another [18,5]. Sentiment analysis is based on three main approaches: 1) Supervised approach.…”
Section: Introductionmentioning
confidence: 99%
“…Sadat et al [26] developed a framework for Arabic dialects classification using probabilistic models across social media datasets. They carried out a set of experiments exploiting the n-gram technique with Markov language model and Naive Bayes classifiers.…”
Section: Related Techniques To Improve Arabic Sentiment Analysismentioning
confidence: 99%