Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Volume 2: Short Pa 2014
DOI: 10.3115/v1/e14-4029
|View full text |Cite
|
Sign up to set email alerts
|

Integrating an Unsupervised Transliteration Model into Statistical Machine Translation

Abstract: We investigate three methods for integrating an unsupervised transliteration model into an end-to-end SMT system. We induce a transliteration model from parallel data and use it to translate OOV words. Our approach is fully unsupervised and language independent. In the methods to integrate transliterations, we observed improvements from 0.23-0.75 (∆ 0.41) BLEU points across 7 language pairs. We also show that our mined transliteration corpora provide better rule coverage and translation quality compared to the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
63
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
5
5

Relationship

3
7

Authors

Journals

citations
Cited by 67 publications
(63 citation statements)
references
References 11 publications
0
63
0
Order By: Relevance
“…We could not train a transliteration system due to unavailability of a transliteration training data. This year we used an EM-based method to induce unsupervised transliteration models (Durrani et al, 2014b). We extracted transliteration pairs automatically from the word-aligned parallel data and used it to learn a transliteration system.…”
Section: Unsupervised Transliteration Modelmentioning
confidence: 99%
“…We could not train a transliteration system due to unavailability of a transliteration training data. This year we used an EM-based method to induce unsupervised transliteration models (Durrani et al, 2014b). We extracted transliteration pairs automatically from the word-aligned parallel data and used it to learn a transliteration system.…”
Section: Unsupervised Transliteration Modelmentioning
confidence: 99%
“…We used the post-decoding transliteration option with this tool. UTM uses a transliteration phrase translation table to evaluate and score multiple possible transliterations (Durrani et al, 2014).…”
Section: Data Pre-processingmentioning
confidence: 99%
“…Hence, we mine transliteration corpora for 110 language pairs from the ILCI corpus, a parallel translation corpora of 11 Indian languages (Jha, 2012). Transliteration pairs are mined using the unsupervised approach proposed by Sajjad et al (2012) and implemented in the Moses SMT system (Durrani et al, 2014). Their approach models parallel translation corpus generation as a generative process comprising an interpolation of a transliteration and a non-transliteration process.…”
Section: Transliteration Miningmentioning
confidence: 99%