Improved statistical machine translation using monolingually-derived paraphrases

Marton, Yuval; Callison-Burch, Chris; Resnik, Philip

doi:10.3115/1699510.1699560

Cited by 79 publications

(68 citation statements)

References 28 publications

(42 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…18 Our framework also makes it possible to compare human paraphrases with those obtained by automatic methods (e.g., Bannard and Callison-Burch [2005], Callison-Burch et al [2006], Callison-Burch [2008], and Marton et al [2009]) on a potentially large scale, which may help improve both our own collaborative translation process and also the state-ofthe-art in automatic paraphrasing. More generally, any component in Figure 2 that is represented by a rectangle can be a task for either humans or machines, which means that any such component can serve both as a source of data for evaluation and development of automated methods and as a testbed for those methods.…”

Section: Discussionmentioning

confidence: 99%

Using targeted paraphrasing and monolingual crowdsourcing to improve translation

Resnik

Buzek

Kronrod

et al. 2013

ACM Trans. Intell. Syst. Technol.

Self Cite

View full text Add to dashboard Cite

Targeted paraphrasing is a new approach to the problem of obtaining cost-effective, reasonable quality translation, which makes use of simple and inexpensive human computations by monolingual speakers in combination with machine translation. The key insight behind the process is that it is possible to spot likely translation errors with only monolingual knowledge of the target language, and it is possible to generate alternative ways to say the same thing (i.e., paraphrases) with only monolingual knowledge of the source language. Formal evaluation demonstrates that this approach can yield substantial improvements in translation quality, and the idea has been integrated into a broader framework for monolingual collaborative translation that produces fully accurate, fully fluent translations for a majority of sentences in a real-world translation task, with no involvement of human bilingual speakers.

show abstract

Section: Discussionmentioning

confidence: 99%

Using targeted paraphrasing and monolingual crowdsourcing to improve translation

Resnik

Buzek

Kronrod

et al. 2013

ACM Trans. Intell. Syst. Technol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…There are many ongoing attempts to develop MT systems for Indian languages (Antony, 2013;Kunchukuttan et al, 2014;Sreelekha et al, 2014;Sreelekha et al, 2015) using both rule based and statistical approaches. There were many attempts to improve the quality of Statistical MT systems such as; using Monolingually-Derived Paraphrases (Marton et al, 2009), using Related Resource-Rich languages (Nakov and Ng, 2012). Considering the large amount of human effort and linguistic knowledge required for developing rule based systems, statistical MT systems became a better choice in terms of efficiency.…”

Section: Related Workmentioning

confidence: 99%

Role of Morphology Injection in SMT

Sreelekha

Bhattacharyya

2017

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

Phrase-based Statistical models are more commonly used as they perform optimally in terms of both, translation quality and complexity of the system. Hindi and in general all Indian languages are morphologically richer than English. Hence, even though Phrase-based systems perform very well for the less divergent language pairs, for English to Indian language translation, we need more linguistic information (such as morphology, parse tree, parts of speech tags, etc.) on the source side. Factored models seem to be useful in this case, as Factored models consider word as a vector of factors. These factors can contain any information about the surface word and use it while translating. Hence, the objective of this work is to handle morphological inflections in Hindi and Marathi using Factored translation models while translating from English. SMT approaches face the problem of data sparsity while translating into a morphologically rich language. It is very unlikely for a parallel corpus to contain all morphological forms of words. We propose a solution to generate these unseen morphological forms and inject them into original training corpora. In this paper, we study factored models and the problem of sparseness in context of translation to morphologically rich languages. We propose a simple and effective solution which is based on enriching the input with various morphological forms of words. We observe that morphology injection improves the quality of translation in terms of both adequacy and fluency. We verify this with the experiments on two morphologically rich languages: Hindi and Marathi, while translating from English.Morphology Injection; a case study on Indian Language perspective

show abstract

“…Therefore, many approaches to augment parallel sentences using paraphrasing have been proposed [5], [6]. Moreover, Callison-Burch et al [7], [8] proposed a method which augments a translation phrase table using paraphrases which are automatically acquired from parallel corpora [9]. However, there is no work which augments input sentences by paraphrasing and representing these paraphrases in lattices.…”

Section: Related Workmentioning

confidence: 99%

Paraphrase Lattice for Statistical Machine Translation

Onishi

Utiyama

Sumita

2011

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYLattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. In this paper, we show that lattice decoding is also useful for handling input variations. "Input variations" refers to the differences in input texts with the same meaning. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to a lattice decoder. The lattice decoder searches for the best path of the paraphrase lattice and outputs the best translation. Experimental results using the IWSLT dataset and the Europarl dataset show that our proposed method obtains significant gains in BLEU scores.

show abstract

Improved statistical machine translation using monolingually-derived paraphrases

Cited by 79 publications

References 28 publications

Using targeted paraphrasing and monolingual crowdsourcing to improve translation

Using targeted paraphrasing and monolingual crowdsourcing to improve translation

Role of Morphology Injection in SMT

Paraphrase Lattice for Statistical Machine Translation

Contact Info

Product

Resources

About