Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminology with the extracted monolingual term lists. We manually evaluate our novel terminology extraction model on English-to-Spanish and English-to-Hindi data sets, and observe excellent performance for all domains. Furthermore, we report the performance of our monolingual terminology extraction model comparing with a number of the state-of-the-art terminology extraction models on the English-to-Hindi datasets.
Abstract. We describe OpenMaTrEx, a free/open-source examplebased machine translation (EBMT) system based on the marker hypothesis, comprising a marker-driven chunker, a collection of chunk aligners, and two engines: one based on a simple proof-of-concept monotone EBMT recombinator and a Moses-based statistical decoder. OpenMaTrEx is a free/open-source release of the basic components of MaTrEx, the Dublin City University machine translation system.
In this paper, we describe the machine translation system in the evaluation campaign of the Fourth Workshop on Statistical Machine Translation at EACL 2009. We describe the modular design of our multi-engine MT system with particular focus on the components used in this participation. We participated in the translation task for the following translation directions: French-English and English-French, in which we employed our multi-engine architecture to translate. We also participated in the system combination task which was carried out by the MBR decoder and Confusion Network decoder. We report results on the provided development and test sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.