Nowadays there is an increasing trend in the usage of computers for storing documents. As a result of it substantial volume of data is stored in the computers in the form of documents. The documents can be of any form such as structured documents, semi-structured documents and unstructured documents. Retrieving useful information from huge volume of documents is very tedious task. Text mining is an inspiring research area as it tries to discover knowledge from unstructured text. This paper gives an overview of concepts, applications, issues and tools used for text mining.
Abstract-This paper explains about similarity measure and the relationship between the knowledge repositories. This paper also describes the significance of document similarity measures, algorithms and to which type of text it can be applied Document similarity measures are of full text similarity, paragraph similarity, sentence similarity, semantic similarity, structural similarity and statistical measures. Two different frameworks had been proposed in this paper, one for measuring document to document similarity and the other model which measures similarity between documents to multiple documents. These two proposed models can use any one of the similarity measures in implementation aspect, which is been put forth for further research.
In this research an advanced and automatic approach in finding the new products which are similar to the previous one that results in a rapid experience in production of design of manufacturing procedure of the new product has been proposed. The proposed work is based on advanced ontology based semantic model which computes similarity between the subgraphs in an effective manner. It builds a hierarchical structure by means of new similarity index that forms by overlapping sub-graph of existing two product concepts. By means of stored data the similarity measurement is calculated by matching the similar characteristics with the new one that helps in discovering knowledge. The examined result with the real-time data shows minimum computation cost along with high processing speed in similarity according to the global environment. Thus proves the proposed scheme is far better than other existing similarity approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.