Proceedings of the Nineteenth Conference on Computational Natural Language Learning 2015
DOI: 10.18653/v1/k15-1005
|View full text |Cite
|
Sign up to set email alerts
|

AIDA2: A Hybrid Approach for Token and Sentence Level Dialect Identification in Arabic

Abstract: In this paper, we present a hybrid approach for performing token and sentence levels Dialect Identification in Arabic. Specifically we try to identify whether each token in a given sentence belongs to Modern Standard Arabic (MSA), Egyptian Dialectal Arabic (EDA) or some other class and whether the whole sentence is mostly EDA or MSA. The token level component relies on a Conditional Random Field (CRF) classifier that uses decisions from several underlying components such as language models, a named entity reco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
3

Relationship

2
8

Authors

Journals

citations
Cited by 21 publications
(17 citation statements)
references
References 11 publications
0
17
0
Order By: Relevance
“…Similarly, Lins and Gonçalves (2004) used words from closed word classes, and Stupar et al (2011) used lists of function words. Al-Badrashiny et al (2015) used a lexicon of Arabic words and phrases that convey modality. Common to these features is that they are determined based on linguistic knowledge.…”
Section: Wordsmentioning
confidence: 99%
“…Similarly, Lins and Gonçalves (2004) used words from closed word classes, and Stupar et al (2011) used lists of function words. Al-Badrashiny et al (2015) used a lexicon of Arabic words and phrases that convey modality. Common to these features is that they are determined based on linguistic knowledge.…”
Section: Wordsmentioning
confidence: 99%
“…For example, Elfardy et al (2013) present a system for the detection of code-switching between MSA and Egyptian Arabic which selects a tag based on the sequence with a maximum marginal probability, considering 5-grams. A later version of the system is named AIDA2 (Al-Badrashiny et al, 2015) and it is a more complex hybrid system that incorporates different classifiers and components such as language models, a named entity recognizer, and a morphological analyzer. The classification strategy is built as a cascade voting system, whereby a conditional Random Field (CRF) classifier tags each word based on the decisions from four other underlying classifiers.…”
Section: Related Workmentioning
confidence: 99%
“…Cross-lingual analysis has received some attention in the NLP community, especially when applied in neural systems. Among a few research directions of cross-lingual analysis are multilingual word embeddings (Ammar et al, 2016;Hermann and Blunsom, 2013) and dialect identification systems (Malmasi et al, 2016;Al-Badrashiny et al, 2015). Traditional NLP tasks such as POS-tagging (Cotterell and Heigold, 2017), morphological reinflection (Kann et al, 2017) and dependency parsing (Guo et al, 2015) benefit from cross-lingual training too.…”
Section: Cross-lingual Analysismentioning
confidence: 99%