Downstream processing of machine translation (MT) output promises to be a solution to improve translation quality, especially when the MT system's internal decoding process is not accessible. Both rule-based and statistical automatic postediting (APE) methods have been proposed over the years, but with contrasting results. A missing aspect in previous evaluations is the assessment of different methods: i) under comparable conditions, and ii) on different language pairs featuring variable levels of MT quality. Focusing on statistical APE methods (more portable across languages), we propose the first systematic analysis of two approaches. To understand their potential, we compare them in the same conditions over six language pairs having English as source. Our results evidence consistent improvements on all language pairs, a relation between the extent of the gain and MT output quality, slight but statistically significant performance differences between the two methods, and their possible complementarity.
The paper presents an approach to morphological compound splitting that takes the degree of compositionality into account. We apply our approach to German noun compounds and particle verbs within a German-English SMT system, and study the effect of only splitting compositional compounds as opposed to an aggressive splitting. A qualitative study explores the translational behaviour of non-compositional compounds.
Compounding in morphologically rich languages is a highly productive process which often causes SMT approaches to fail because of unseen words. We present an approach for translation into a compounding language that splits compounds into simple words for training and, due to an underspecified representation, allows for free merging of simple words into compounds after translation. In contrast to previous approaches, we use features projected from the source language to predict compound mergings. We integrate our approach into end-to-end SMT and show that many compounds matching the reference translation are produced which did not appear in the training data. Additional manual evaluations support the usefulness of generalizing compound formation in SMT.
Support-verb constructions (i.e., multiword expressions combining a semantically light verb with a predicative noun) are problematic for standard statistical machine translation systems, because SMT systems cannot distinguish between literal and idiomatic uses of the verb. We work on the German to English translation direction, for which the identification of support-verb constructions is challenging due to the relatively free word order of German. We show that we achieve improved translation quality for verb-object supportverb constructions by marking the verbs when occuring in such constructions. Additional evaluations revealed that our systems produce more correct verb translations than a contrastive baseline system without verb markup.
We present the CimS submissions to the 2014 Shared Task for the language pair EN→DE. We address the major problems that arise when translating into German: complex nominal and verbal morphology, productive compounding and flexible word ordering. Our morphologyaware translation systems handle word formation issues on different levels of morpho-syntactic modeling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.