Abstract. We present iCluster, a self-organizing peer-to-peer overlay network for supporting full-fledged information retrieval in a dynamic environment. iCluster works by organizing peers sharing common interests into clusters and by exploiting clustering information at query time for achieving low network traffic and high recall. We define the criteria for peer similarity and peer selection, and we present the protocols for organizing the peers into clusters and for searching within the clustered organization of peers. iCluster is evaluated on a realistic peer-to-peer environment using real-world data and queries. The results demonstrate significant performance improvements (in terms of clustering efficiency, communication load and retrieval accuracy) over a state-of-the-art peerto-peer clustering method. Compared to exhaustive search by flooding, iCluster exchanged a small loss in retrieval accuracy for much less message flow.
Semantic overlay networks cluster peers that are semantically, thematically or socially close into groups by means of a rewiring procedure that is periodically executed by each peer. Rewiring proceeds by establishing new connections to similar peers, and by discarding connections that are outdated or pointing to dissimilar peers. This process aims at improving cluster quality (how well peers with similar content are clustered together) and by this, at improving the flow of information in the network by reducing the number of messages that are exchanged. Therefore, measuring the quality of clustering is an important issue by itself. This is exactly the issue this work is dealing with. In this paper, we introduce a new clustering measure that takes into account the whole neighborhood of a peer (rather than its direct neighbors) thus, providing better insight on the quality of the underlying clustered organisation. Our experimental evaluation with real-word data and queries confirms our assumption that the new measure is better suited for measuring clustering quality than other known measures, such as the (generalised) clustering coefficient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.