This paper presents a new method for constructing bilingual lexicons through a pivot language. The proposed method is adapted from the context-based approach, called the standard approach, which is well-known for building bilingual lexicons using comparable corpora. The main difference between the standard approach and the proposed method is how to represent context vectors. The former is to represent context vectors in a target language, while the latter in a pivot language. The proposed method is very simplified from the standard approach thereby. Furthermore, the proposed method is more accurate than the standard approach because it uses parallel corpora instead of comparable corpora. The experiments are conducted on a language pair, Korean and Spanish. Our experimental results have shown that the proposed method is quite attractive where a parallel corpus directly between source and target languages are unavailable, but both source-pivot and pivot-target parallel corpora are available.
Bilingual multi-word lexicons are helpful for statistical machine translation systems to improve their performance. In this paper we present a method for constructing such lexicons in a resource-poor language pair such as Korean-French. By using two parallel corpora sharing one pivot language we can easily construct such lexicons without any external language resource like a seed dictionary. The experimental results for the KR to FR have shown that the accuracy is quite promising, even though this research is ongoing.
Scoring short-answer questions has disadvantages that may take long time to grade and may be an issue on consistency in scoring. To alleviate the disadvantages, automated scoring systems are widely used in America or Europe, but, in Korea, there has been researches regarding the automated scoring. In this paper, we propose an automated scoring tool for Korean short-answer questions using a semisupervised learning method. The answers of students are analyzed and processed through natural language processing and unmarked-answers are automatically scored by machine learning methods. Then scored answers with high reliability are added in the training corpus iteratively and incrementally. Through the pilot experiment, the proposed system is evaluated for Korean and social subjects in Programme for National Student Assessment. We have showed that the processing time and the consistency of grades are promisingly improved. Using the proposed tool, various assessment methods have got to be development before applying to school test fields.
We present a novel iterative approach on bilingual lexicon extraction from comparable corpora. The approach is based on vector space model for word representation and a modified Perceptron algorithm. The approach requires a seed dictionary and a large amount of unlabeled training data. The seed dictionary is generated using the pivot-based approach and the unlabeled training data is dynamically labeled by the modified Perceptron algorithm using a similarity measure during learning process. In this paper, we extract bilingual lexicons by iteratively applying our proposed approach via the modified Perceptron algorithm. The empirical results have shown that our proposed approach significantly improves the accuracy for the top 1 candidate. In the future we will try to apply the multilayered Perceptron algorithm to our iterative approach for effective word representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.