2015
DOI: 10.48550/arxiv.1509.01938
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

Abstract: Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language. In this study we develop techniques for extracting parallel data for one particular dialect of Arabic (Iraqi Arabic) from out-ofdomain corpora in different dialects of Arabic or in Modern Standard Arabic. We compare two different data selection strategies (cross-entropy based and submodular selection) and demonstrate that a very small but … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 18 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?