Although large-scale grammars are prerequisite for parsing a great variety of sentences,it is difficult to build such grammars by hand.Yet,it is possible to derive a context-free grammar(CFG)automatically from an existing large-scale,syntactically annotated corpus.While seemingly a simple task,CFGs derived in such fashion have seldom been applied to existing systems.This is probably due to a great number of possible parse results(i.e.high ambiguity).In this paper,we analyze some causes of high ambiguity,and we propose a policy for building a large-scale Japanese CFG for syntactic parsing,capable of decreasing ambiguity.We also provide an experimental evaluation of the obtained CFG showing reduction in the number of parse results (reduced ambiguity)created by the CFG and the improved parsing accuracy.
Our research aim is the automatic generation of a researcher's research history from research articles published on the internet. Research history generation based on the k-Means clustering algorithm has been proposed in previous work. However, the performance of the k-Means algorithm is unsatisfactory. We propose a method based on Maximum Margin Clustering (MMC). MMC is a new clustering algorithm based on Support Vector Machines (SVM). It is known that MMC is better than existing clustering algorithms such as k-Means. In this paper, we describe how to convert articles into vectors using metainformation about them and how to decide an initial setting for MMC automatically. We demonstrate by experiment that the purity of a method based on MMC is about 0.58 and its entropy is about 0.415. This result is better than that achieved in previous work (purity: 0.35, entropy: 0.47).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.