We hope that workshops such as this one will continue to stimulate work on Discourse and Machine Translation, in a wide range of discourse phenomena and MT architectures.We would like to thank all the authors who submitted papers to the workshop, as well as all the members of the Program Committee who reviewed the submissions and delivered thoughtful, informative reviews.
AbstractWe describe the design, the setup, and the evaluation results of the DiscoMT 2017 shared task on cross-lingual pronoun prediction. The task asked participants to predict a target-language pronoun given a source-language pronoun in the context of a sentence. We further provided a lemmatized target-language human-authored translation of the source sentence, and automatic word alignments between the source sentence words and the targetlanguage lemmata. The aim of the task was to predict, for each target-language pronoun placeholder, the word that should replace it from a small, closed set of classes, using any type of information that can be extracted from the entire document.We offered four subtasks, each for a different language pair and translation direction: English-to-French, Englishto-German, German-to-English, and Spanish-to-English.Five teams participated in the shared task, making submissions for all language pairs. The evaluation results show that all participating teams outperformed two strong n-gram-based language model-based baseline systems by a sizable margin.
IntroductionPronoun translation poses a problem for machine translation (MT) as pronoun systems do not map well across languages, e.g., due to differences in gender, number, case, formality, or humanness, as well as because of language-specific restrictions about where pronouns may be used. For example, when translating the English it into French an MT system needs to choose between il, elle, and cela, while translating the same pronoun into German would require a choice between er, sie, and es. This is hard as selecting the correct pronoun may need discourse analysis as well as linguistic and world knowledge. Null subjects in pro-drop languages pose additional challenges as they express person and number within the verb's morphology, rendering a subject pronoun or noun phrase redundant. Thus, translating from such languages requires generating a pronoun in the target language for which there is no pronoun in the source.Pronoun translation is known to be challenging not only for MT in general, but also for Statistical Machine Translation (SMT) in particular (Le Nagard and Koehn, 2010;Hardmeier and Federico, 2010; Nov谩k, 2011;. Phrase-based SMT (Koehn et al., 2013) was state of the art until recently, but it is gradually being replaced by Neural Machine Translation, or NMT, (Cho et al., 2014; Sutskever et al., 2014; Bahdanau et al., 2015;Luong et al., 2015). 1 NMT yields generally higher-quality translation, but is harder to analyze, and thus little is known about how well it handles pronoun translation. Yet, it is clear that it has access to larger context compa...