Statistical Machine Translation has come a long way improving the translation quality of a range of different linguistic phenomena. With negation however, techniques proposed and implemented for improving translation performance on negation have simply followed from the developers' beliefs about why performance is worse. These beliefs, however, have never been validated by an error analysis of the translation output. In contrast, the current paper shows that an informative empirical error analysis can be formulated in terms of (1) the set of semantic elements involved in the meaning of negation, and (2) a small set of string-based operations that can characterise errors in the translation of those elements. Results on a Chinese-to-English translation task confirm the robustness of our analysis cross-linguistically and the basic assumptions can inform an automated investigation into the causes of translation errors. Conclusions drawn from this analysis should guide future work on improving the translation of negative sentences.
IntroductionIn recent years, there has been increasing interest in improving the quality of SMT systems over a wide range of linguistic phenomena, including coreference resolution (Hardmeier et al., 2014) and modality (Baker et al., 2012). Amongst these, however, translating negation is still a problem that has not been researched thoroughly.This paper takes an empirical approach towards understanding why negation is a problem in SMT.More specifically, we try to answer two main questions:1. What kind of errors are involved in translating negation?2. What are the causes of these errors during decoding?While previous work (section 2) has shown that translating negation is a problem, it has not addressed either of these questions. The present paper focuses on the first one; we show that tailoring to a semantic task, string-based error categories standardly used to evaluate the quality of the machine translation output, allows us to cover the wide range of errors occurring while translating negative sentences (section 3). We report the results of the analysis of a Hierarchical Phrase Based Model (Chiang, 2007) on a Chineseto-English translation task (section 4), where we show that all error categories occur to some extent with scope reordering being the most frequent (section 5).Addressing question (2) requires connecting the assumptions behind this manual error analysis to errors occurring along the translation pipeline. As such, we complete the analysis by briefly introduce an automatic method to investigate the causes of the errors at decoding time (section 6).Conclusion and future works are reported in section 7 and 8.
2 Previous WorkIn recent years, automatic recognition of negation has been the focus of considerable work. Following Blanco and Moldoval (2011) and Morante and Blanco (2012) detecting negation is a task of unraveling its structure, i.e. locating in a text its four main components:• Cue: the word or multi-word unit inherently expressing negation (e.g. 'He is not d...