Extended AbstractTwo most popular approaches to facilitate searching for information on the web are represented by web search engine and web directories. Although the performance of search engines is improving every day, searching on the web can be a tedious and time-consuming task due to the huge size and highly dynamic nature of the web. Moreover, the user's "intention behind the search" is not clearly expressed which results in too general, short queries. Results returned by search engine can count from hundreds to hundreds of thousands of documents.One approach to manage the large number of results is clustering. Search results clustering can be defined as a process of automatical grouping search results into to thematic groups. However, in contrast to traditional document clustering, clustering of search results are done on-the-fly (per user query request) and locally on a limited set of results return from the search engine. Clustering of search results can help user navigate through large set of documents more efficiently. By providing concise, accurate description of clusters, it lets user localizes interesting document faster.In this paper, we proposed an approach to search results clustering based on Tolerance Rough Set following the work on document clustering [4,3]. Tolerance classes are used to approximate concepts existed in documents. The application of Tolerance Rough Set model in document clustering was proposed as a way to enrich document and cluster representation with the hope of increasing clustering performance.Tolerance Rough Set Model: (TRSM) was developed in [3] as basis to model documents and terms in information retrieval, text mining, etc. With its ability to deal with vagueness and fuzziness, TRSM seems to be promising tool to model relations between terms and documents. In many information retrieval problems, defining the similarity relation between document-document, term-term or termdocument is essential.Let D = {d 1 , . . . , d N } be a set of documents and T = {t 1 , . . . , t M } set of index terms for D. TRSM is an approximation space (see [5]) R = (T, I θ , ν, P ) determined over the set of terms T (universe of R) as follows:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.