We describe a method for incorporating syntactic information in statistical machine translation systems. The first step of the method is to parse the source language string that is being translated. The second step is to apply a series of transformations to the parse tree, effectively reordering the surface string on the source language side of the translation system. The goal of this step is to recover an underlying word order that is closer to the target language word-order than the original string. The reordering approach is applied as a pre-processing step in both the training and decoding phases of a phrase-based statistical MT system. We describe experiments on translation from German to English, showing an improvement from 25.2% Bleu score for a baseline system to 26.8% Bleu score for the system with reordering, a statistically significant improvement.
I argue for a novel model of feature valuation in the CI interface and explore under what circumstances a syntactic feature is semantically interpretable. As the groundwork for the investigation, I propose an explicit Distributed Morphology model of Italian nouns of profession. The data provide evidence that the morphology accesses the narrow-syntax representation at two different temporal points within a phase: the earlier point (Spell-Out) returns a morphological realization faithful to feature values present in narrow syntax, while the later point (Transfer) allows for a narrow-syntax representation to be enriched by the CI component. Thus, there is no syntactic distinction between interpretable and uninterpretable features: a syntactic feature appears to be interpretable only if it has been licensed by the CI interface.
This paper proposes a statistical, treeto-tree model for producing translations. Two main contributions are as follows: (1) a method for the extraction of syntactic structures with alignment information from a parallel corpus of translations, and (2) use of a discriminative, featurebased model for prediction of these targetlanguage syntactic structures-which we call aligned extended projections, or AEPs. An evaluation of the method on translation from German to English shows similar performance to the phrase-based model of Koehn et al. (2003).
This chapter argues – closely following the insights of Berge (2011) – that the ergative clause structure of the Inuit language is conditioned by information structure properties, more precisely by its topic comment properties. It articulates a formal model where the morphosyntactic properties result from this information structure trigger. Furthermore it shows that not only does the model correctly account for the split case and agreement properties of the Inuit language, but also other relevant properties discussed in the literature, i.e., scope properties of objects and aspect. It is also argued that objects in this language are introduced through an applicative head (Basilico 2012), after which they either topicalize or get assigned oblique case.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.