We introduce a method for learning to translate out-of-vocabulary (OOV) words. The method focuses on combining sublexical/constituent translations of an OOV to generate its translation candidates. In our approach, wildcard searches are formulated based on our OOV analysis, aimed at maximizing the probability of retrieving OOVs’ sublexical translations from existing resources of Machine Translation (MT) systems. At run-time, translation candidates of the unknown words are generated from their suitable sublexical translations and ranked based on monolingual and bilingual information. We have incorporated the OOV model into a state-of-the-art machine translation system and experimental results show that our model indeed helps to ease the impact of OOVs on translation quality, especially for sentences containing more OOVs (significant improvement).
We introduce a method for learning to predict text and grammatical construction in a computer-assisted translation and writing framework. In our approach, predictions are offered on the fly to help the user make appropriate lexical and grammar choices during the translation of a source text, thus improving translation quality and productivity. The method involves automatically generating general-to-specific word usage summaries (i.e., writing suggestion module), and automatically learning high-confidence word-or phrase-level translation equivalents (i.e., translation suggestion module). At runtime, the source text and its translation prefix entered by the user are broken down into n-grams to generate grammar and translation predictions, which are further combined and ranked via translation and language models. These ranked prediction candidates are iteratively and interactively displayed to the user in a pop-up menu as translation or writing hints. We present a prototype writing assistant, TransAhead, that applies the method to a human-computer collaborative environment. Automatic and human evaluations show that novice translators or language learners substantially benefit from our system in terms of translation performance (i.e., translation accuracy and productivity) and language learning (i.e., collocation usage and grammar). In general, our methodology of inline grammar and text predictions or suggestions has great potential in the field of computer-assisted translation, writing, or even language learning.
Various writing assistance tools have been developed through efforts in the areas of natural language processing with different degrees of success of curriculum integration depending on their functional rigor and pedagogical designs. In this paper, we developed a system, WriteAhead, that provides six types of suggestions when non‐native graduate students of English from different disciplines are composing journal abstracts, and assessed its effectiveness. The method involved automatically building domain‐specific corpora of abstracts from the Web via domain names and related keywords as query expansions, and automatically extracting vocabulary and n‐grams from the corpora in order to offer writing suggestions. At runtime, learners' input in the writing area of the system actively triggered a set of corresponding writing suggestions. This abstract writing assistant system facilitates interactions between learners and the system for writing abstracts in an effective and contextualized way, by providing suggestions such as collocations or transitional words. For assessment of WriteAhead, we compared the writing performance of two groups of students with or without using the system, and adopted student perception data. Findings show that the experiment group wrote better, and most students were satisfied with the system concerning most suggestion types, as they can effectively compose quality abstracts through provided language supports from WriteAhead.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.