Many existing text clustering algorithms overlook the semantic information between words and so they possess a lower accuracy of text similarity computation. A new text hybrid clustering algorithm (HCA) based on HowNet semantics has been proposed in this paper. It calculates the semantic similarity of words by using the words’ semantic concept description in HowNet and then combines it with the method of maximum weight matching of bipartite graph to calculate a semantic-based text similarity. Based on the new text similarity and by combining an improved genetic algorithm with k-medoids algorithm, HCA has been designed. The comparative experiments show that: 1) compared with two existing traditional clustering algorithms, HCA can get better quality and 2) when their text cosine similarity is replaced with the new semantic-based text similarity, all the qualities of the three clustering algorithms can be improved significantly.
When describing a document in Vector Space Model (VSM), it often assumes that there is no semantic relationship between words or they are orthogonal to each other. In order to improve the inaccurate document description, a new document description method has been proposed in this paper by introducing a concept-word, which calculates the semantic similarity between words based on HowNet ontology database. Comparative experiments show that the new method can not only improve effectively the effect of document feature description in VSM, but also reduce significantly the dimension of a document vector. The research is very useful to document clustering, query word expansion in Web information retrieval and personalized service in e-business applications.
Current popular search engines are built to serve all users, independent of the needs of any individual user. A personalized query expansion method based on user's historical interested Web pages (UHIWPs) and user’s historical query terms (UHQTs) is proposed in this paper. When a user submits a query keyword to a search engine, the new algorithm can automatically locate the current user’s implicit search intention and compute the term-term associations dynamically according to the user’s UHIWPs and UHQTs. More personalized expansion terms then will be generated and submitted to the search engine together with the query keyword. As a result, different search results can be returned to different users even though they input the same query keywords. Experimental results show that this method is better than the current algorithm in average precision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.