Proceedings of the EMNLP'2014 Workshop on Language Technology for Closely Related Languages and Language Variants 2014
DOI: 10.3115/v1/w14-4213
|View full text |Cite
|
Sign up to set email alerts
|

Handling OOV Words in Dialectal Arabic to English Machine Translation

Abstract: Dialects and standard forms of a language typically share a set of cognates that could bear the same meaning in both varieties or only be shared homographs but serve as faux amis. Moreover, there are words that are used exclusively in the dialect or the standard variety. Both phenomena, faux amis and exclusive vocabulary, are considered out of vocabulary (OOV) phenomena. In this paper, we present this problem of OOV in the context of machine translation. We present a new approach for dialect to English Statist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 11 publications
0
7
0
Order By: Relevance
“…The availability of some parallel corpora makes this research direction possible. Furthermore, availability of new tools related to dialect identification (at word and sentences levels) has a positive impact on machine trans-lation performance as it was shown in (Aminian et al, 2014;Salloum et al, 2014). Indeed, in this last work, identifying either the sentence is dialectal or MSA guides the selection of the MT system to use.…”
Section: Discussionmentioning
confidence: 87%
See 3 more Smart Citations
“…The availability of some parallel corpora makes this research direction possible. Furthermore, availability of new tools related to dialect identification (at word and sentences levels) has a positive impact on machine trans-lation performance as it was shown in (Aminian et al, 2014;Salloum et al, 2014). Indeed, in this last work, identifying either the sentence is dialectal or MSA guides the selection of the MT system to use.…”
Section: Discussionmentioning
confidence: 87%
“…The performance of the system improves by 2.3 BLEU points when pivoting through MSA for first experiment, but when adding more dialectal data to training set (400k words) direct translation becomes better than mapping to MSA despite the significantly low OOV rate with MSA-mapping. Aminian et al (2014) dealt with OOV words in the context of Arabic to English SMT system. They adopted an approach that normalizes dialectal words to MSA words by using AIDA 6 (Elfardy et al, 2014) and MADAMIRA 7 (Pasha et al, 2014), to identify and replace dialectal Arabic OOV words with their MSA equivalents.…”
Section: Translating Between Arabic Dialects and Foreign Languagesmentioning
confidence: 99%
See 2 more Smart Citations
“…The main difference between this approach and our previous work as described in (Aminian et al, 2014) lies in the fact that we try to improve SMT lexical choice by enhancing FF translation. Rather than blindly replacing all dialectal words with their standard equivalent as we did in (Aminian et al, 2014), here we try to automatically identify FF as one of the important sources of translation degradation across language variants and leverage knowledge acquired from monolingual standard data to predict the best equivalent for FF based on the context.…”
Section: Related Workmentioning
confidence: 99%