Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers 2016
DOI: 10.18653/v1/w16-2351
|View full text |Cite
|
Sign up to set email alerts
|

It-disambiguation and source-aware language models for cross-lingual pronoun prediction

Abstract: We present our systems for the WMT 2016 shared task on cross-lingual pronoun prediction. The main contribution is a classifier used to determine whether an instance of the ambiguous English pronoun "it" functions as an anaphoric, pleonastic or event reference pronoun. For the English-to-French task the classifier is incorporated in an extended baseline, which takes the form of a source-aware language model. An implementation of the sourceaware language model is also provided for each of the remaining language … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2016
2016
2017
2017

Publication Types

Select...
5

Relationship

5
0

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…In the context of machine translation, work by Le Nagard and Koehn (2010); Novák et al (2013); Guillou (2015) and Loáiciga et al (2016) have also considered disambiguating the function of the pronoun 'it' in the interest of improving pronoun translation into different languages.…”
Section: Related Workmentioning
confidence: 99%
“…In the context of machine translation, work by Le Nagard and Koehn (2010); Novák et al (2013); Guillou (2015) and Loáiciga et al (2016) have also considered disambiguating the function of the pronoun 'it' in the interest of improving pronoun translation into different languages.…”
Section: Related Workmentioning
confidence: 99%
“…The main contribution of the UUPPSALA-PRIMARY system (Loáiciga et al, 2016) for English-French is a Maximum Entropy classifier used to determine whether an instance of the English pronoun "it" functions as an anaphoric, pleonastic, or event reference pronoun. The classifier is trained on a combination of semantic, based on lexical resources such as VerbNet (Schuler, 2005) and WordNet (Miller, 1995), and frequencies computed over the annotated Gigaword corpus (Napoles et al, 2012), syntactic, from the dependency parser in the Mate tools (Bohnet et al, 2013), and contextual features.…”
Section: Uuppsalamentioning
confidence: 99%
“…The neural network models evaluate the context in the current and in the preceding sentence of the prediction placeholder (in the target language) and the aligned pronoun (in the source language) with a convolutional layer, followed by max-pooling and a softmax output layer. The ngram language model is identical to the sourceaware n-gram model of Hardmeier (2016) and Loáiciga et al (2016). It makes its prediction using Viterbi decoding over a standard n-gram model.…”
Section: Uu-hardmeiermentioning
confidence: 99%