This paper reports the latest performance of components and features of a project named Corpus-Centered Computation (C'3), which targets a translation technology suitable for spoken language translation. C3 places corpora at the center of the technology. Translation knowledge is extracted from corpora by both EBMT and SMT methods, translation quality is gauged by referring to corpora, the best translation among multiple-engine outputs is selected based on corpora and the corpora themselves are paraphrased or filtered by automated processes.
Example-based machine translation (EBMT) is based on a bilingual corpus. In EBMT, sentences similar to an input sentence are retrieved from a bilingual corpus and then output is generated from translations of similar sentences. Therefore, a similarity measure between the input sentence and each sentence in the bilingual corpus is important for EBMT. If some similar sentences are missed from retrieval, the quality of translations drops. In this paper, we describe a method to acquire synonymous expressions from a bilingual corpus and utilize them to expand retrieval of similar sentences. Synonymous expressions are acquired from dierences in synonymous sentences. Synonymous sentences are clustered by the equivalence of translations. Our method has the advantage of not relying on rich linguistic knowledge, such as sentence structure and dictionaries. We demonstrate the eect on applying our method to a simple EBMT.
In this paper, we propose incorporating similar sentence retrieval in machine translation to improve the translation of hard-to-translate input sentences. If a given input sentence is hard to translate, a sentence similar to the input sentence is retrieved from a monolingual corpus of translatable sentences and then provided to the MT system instead of the original sentence. This method is advantageous in that it relies only on a monolingual corpus. The similarity between an input sentence and each sentence in the corpus is determined from the ratio of the common N-gram. We use two conditions to improve the retrieval precision and add a filtering method to avoid inappropriate sentences. An experiment using a Japanese-to-English MT system in a travel conversation domain proves that our method improves the translation quality of hard-to-translate input sentences by 9.8 %.
Abstract. This paper presents a method for acquiring synonyms from monolingual comparable text (MCT). MCT denotes a set of monolingual texts whose contents are similar and can be obtained automatically. Our acquisition method takes advantage of a characteristic of MCT that included words and their relations are confined. Our method uses contextual information of surrounding one word on each side of the target words. To improve acquisition precision, prevention of outside appearance is used. This method has advantages in that it requires only part-ofspeech information and it can acquire infrequent synonyms. We evaluated our method with two kinds of news article data: sentence-aligned parallel texts and document-aligned comparable texts. When applying the former data, our method acquires synonym pairs with 70.0% precision. Re-evaluation of incorrect word pairs with source texts indicates that the method captures the appropriate parts of source texts with 89.5% precision. When applying the latter data, acquisition precision reaches 76.0% in English and 76.3% in Japanese.
Example-based machine translation (EBMT) is a promising translation method for speechto-speech translation because of its robustness. It retrieves example sentences similar to the input and adjusts their translations to obtain the output. However, it has problems in that the performance degrades when input sentences are long and when the style of inputs and that of the example corpus are different. This paper proposes a method for retrieving "meaning-equivalent sentences" to overcome these two problems. A meaning-equivalent sentence shares the main meaning with an input despite lacking some unimportant information. The translations of meaning-equivalent sentences correspond to "rough translations." The retrieval is based on content words, modality, and tense.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.