Document clustering usually deals with clustering of documents that revolve around a single topic. To achieve more efficient clustering results, it is important to consider the fact that a document may deal with more than one topic. Our research work proposes a new inter-passage based clustering technique which will cluster the segment of the documents on the basis of similarities. The input will be the collection of documents consisting of multi topic segments taken from web. SentiWordNet has been used to calculate the segment score of the segments within the documents. Based upon the segment score segment based clustering is performed on the intra-document level. Once we are done with intra-document segment based clustering then k-means approach is applied to the entire collection of documents to perform inter-document clustering in which the similar segments of various documents will be clustered under a single cluster. Our proposed technique would help in efficient organization of multi topic documents into their corresponding clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.