Document clustering plays a significant task in the retrieval of the information, which seeks to divide documents into groups automatically, depending on their content similarity. The cluster consists of related documents within the group (having high intra-cluster similarity) and dissimilar to other group documents (having low inter-cluster similarity). Clustering documents should be considered an unsupervised process that aims to classify documents by identifying underlying structures, i.e. the learning process is unsupervised. So there is no need to determine the correct output for an input. Previous clustering methods do not know the semantic associations between words such that the context of documents cannot be correctly interpreted. In order to address this problem, the advent of semantic ontology information such as WordNet was widely used to enhance text clustering consistency. This paper initially proposes an OntoVSM model to reduce the dimension of the document efficiently. The cover K-means clustering algorithm is proposed for semantic document clustering. The proposed algorithm is a hybrid version of K-Means and covers coefficient-based clustering methodology (C3M) that is improved semantically using WordNet ontology. The dimensionality reduction based on semantic knowledge of each term preserves the information without loss. The performance of the proposed work is analysed through experimental results. This shows that the proposed work gives improved results compared to other standard methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.