Abstract-Mining the implicit knowledge in the electronic documents is a critical task in text analysis and data mining. To attain a knowledge-based view of the electronic documents, the clustering method based upon the topic cannot only be used, but also that based upon the extraction can be done. Therefore, a novel method for the clustering of the electronic documents, summarizing of the full text based on the extracted segments, and an evaluation using multi-measures for the importance to the document were presented. In the method, eighteen kinds of named entities and two kinds of syntactical phrases were extracted, and exploited for the text clustering. Then, a novel similarity equation was proposed for the calculation about the extractions. Meantime, three measures for the importance to the document were proposed, which provided a different view for the document's content, and recommended a prior checking for the users. Therefore, the method can improve the efficiency of the knowledge discovery, and enhance the management of the document on the large scale of document collection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.