Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2014
DOI: 10.3115/v1/p14-2125
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Level Dialect Identification for Machine Translation System Selection

Abstract: In this paper we study the use of sentencelevel dialect identification in optimizing machine translation system selection when translating mixed dialect input. We test our approach on Arabic, a prototypical diglossic language; and we optimize the combination of four different machine translation systems. Our best result improves over the best single MT system baseline by 1.0% BLEU and over a strong system selection baseline by 0.6% BLEU on a blind test set.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 26 publications
(24 citation statements)
references
References 17 publications
0
24
0
Order By: Relevance
“…(1) MSA and dialectal Arabic; (2) four-way classification -MSA, Egyptian Arabic, Gulf Arabic, and Levantine Arabic; and (3) three-way classification -Egyptian Arabic, Gulf Arabic, and Levantine Arabic. Salloum et al (2014) explores the use of sentence-level Arabic dialect identification as a pre-processor for MT, in customizing the selection of the MT model used to translate a given sentence to the dialect it uses. In performing dialect-specific MT, the authors achieve an improvement of 1.0% BLEU score compared with a baseline system which does not differentiate between Arabic dialects.…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
“…(1) MSA and dialectal Arabic; (2) four-way classification -MSA, Egyptian Arabic, Gulf Arabic, and Levantine Arabic; and (3) three-way classification -Egyptian Arabic, Gulf Arabic, and Levantine Arabic. Salloum et al (2014) explores the use of sentence-level Arabic dialect identification as a pre-processor for MT, in customizing the selection of the MT model used to translate a given sentence to the dialect it uses. In performing dialect-specific MT, the authors achieve an improvement of 1.0% BLEU score compared with a baseline system which does not differentiate between Arabic dialects.…”
Section: Similar Languages Language Varieties and Dialectsmentioning
confidence: 99%
“…Whereas adapting large MSA/English parallel data gives significant reduction of OOV rate to 0.7% and leads to an absolute BLEU increase of 2.73 points. Salloum et al (2014) explored the impact of sentence-level dialect identification used with various linguistic features on machine translation performance. They attempted to optimize the selection of outputs produced by different MT systems given an input text including a mixture of dialects and MSA.…”
Section: Translating Between Arabic Dialects and Foreign Languagesmentioning
confidence: 99%
“…ADAM is used in a sentence-level dialect identification approach for machine translation system selection when translating mixed dialect input (MSA and DA) (Salloum et al, 2014). We acquired two sets of training data: DA-to-English (5 M words) and MSA-to-English (57 M words).…”
Section: Dialect Identification For Mt System Selectionmentioning
confidence: 99%