Document clustering is an aggregation of related documents to a cluster based on the similarity evaluation task between documents and the representatives of clusters. Terms and their discriminating features of terms are the clue to the clustering and the discriminating features are based on the term and document frequencies. Feature selection method on the basis of frequency statistics has a limitation to the enhancement of the clustering algorithm because it does not consider the contents of the cluster objects. In this paper, we adopt a content-based analytic approach to refine the similarity computation and propose a keyword-based clustering algorithm. Experimental results show that content-based keyword weighting outperforms frequency-based weighting method.
The scoring of short-answer questions in a national-wide achievement test to public school students needs a lot of human efforts and financial expenses. Since we know that natural language processing technology can be applied to replace the manual scoring process by automatic scoring software, many researchers tried to build an automatic scoring system like crater and e-rater in English. In this paper, we explored a Korean automatic scoring system for short and free-text responses. NLP techniques like morphological analysis are used to build a token-based scoring template for increasing the coverage of the automatic scoring process. We performed an experiment to measure the efficiency of the automatic scoring system and it covered about 90 to 95% of the student responses with an agreement rate 95% to the manual scoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.