Bilingual multi-word lexicons are helpful for statistical machine translation systems to improve their performance. In this paper we present a method for constructing such lexicons in a resource-poor language pair such as Korean-French. By using two parallel corpora sharing one pivot language we can easily construct such lexicons without any external language resource like a seed dictionary. The experimental results for the KR to FR have shown that the accuracy is quite promising, even though this research is ongoing.
Scoring short-answer questions has disadvantages that may take long time to grade and may be an issue on consistency in scoring. To alleviate the disadvantages, automated scoring systems are widely used in America or Europe, but, in Korea, there has been researches regarding the automated scoring. In this paper, we propose an automated scoring tool for Korean short-answer questions using a semisupervised learning method. The answers of students are analyzed and processed through natural language processing and unmarked-answers are automatically scored by machine learning methods. Then scored answers with high reliability are added in the training corpus iteratively and incrementally. Through the pilot experiment, the proposed system is evaluated for Korean and social subjects in Programme for National Student Assessment. We have showed that the processing time and the consistency of grades are promisingly improved. Using the proposed tool, various assessment methods have got to be development before applying to school test fields.
We present a novel iterative approach on bilingual lexicon extraction from comparable corpora. The approach is based on vector space model for word representation and a modified Perceptron algorithm. The approach requires a seed dictionary and a large amount of unlabeled training data. The seed dictionary is generated using the pivot-based approach and the unlabeled training data is dynamically labeled by the modified Perceptron algorithm using a similarity measure during learning process. In this paper, we extract bilingual lexicons by iteratively applying our proposed approach via the modified Perceptron algorithm. The empirical results have shown that our proposed approach significantly improves the accuracy for the top 1 candidate. In the future we will try to apply the multilayered Perceptron algorithm to our iterative approach for effective word representation.
This paper presents a method for constructing bilingual multiword lexicons for a resource-poor language pair such as Korean-French. For this, at first, we identify multiword candidates from parallel corpora, and then use the pivot context approach [1] to align those candidates. Our empirical study shows encouraging results (e.g., accuracy), even though this study is ongoing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.