In peer-to-peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of every single participant. Efficient similarity search is generally recognized as a frontier in research about P2P systems. One way to address it is using data source selection based approaches where peers summarize the data they contribute to the network, generating typically one summary per peer. When processing queries, these summaries are used to choose the peers (data sources) that are most likely to contribute to the query result. Only those data sources are contacted. There are two main contributions of this paper. We extend earlier work, adding a data source selection method for high-dimensional vector data, comparing different peer ranking schemes. More importantly, we present a method that uses progressive stepwise data exchange between peers to better each peer's summary and therefore improve the system's performance. 1
In Peer-to-Peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of each single participant. Among other applications, fielded P2P networks have shown their viability for the distribution and exchange of large amounts of data.Current research on retrieval in P2P systems focuses largely on keyword-based retrieval and other weakly interactive query paradigms.Within this paper, we present a Bayesian image browser that helps the user in finding images in distributed collections. Bayesian image browsers operate by presenting sequences of thumbnail-based collections to the user, at each step collecting user feedback that is used to update a Bayesian model. In contrast to query by example (QBE) no initial query image is needed in order to start the query process.Our approach is scalable in that the state of the Bayesian model is maintained locally in the browsing peer and only a small number of thumbnails is requested from the network at each step. Each query step is thus done in a short time frame.In this paper we present the method, as well as first experiments done using a JXTA-based implementation.
In this paper we introduce a simple yet experimentally convincing approach in the research field of source selection for content-based similarity search in P2P networks or, more concretely, in summary-based P2P systems. In these systems, summaries are used for data source selection when performing k-NN queries on distributed collections of documents represented by feature vectors.We introduce a new type of cluster-based summaries for source selection that can efficiently and cheaply be calculated and distributed in P2P networks. For the summaries generation, a very large number of sample points is used. Each peer in the network assigns its indexing data to their corresponding closest sample points and publishes its constructed summary. We evaluate the quality of these summaries when changing the number of sample points used in experiments on real-world image feature data obtained from a large crawl of the flickr web photo community and show that for higher numbers of sample points we achieve a better retrieval performance. Our experiments show that the proposed summaries yield four times better performance with respect to previous methods. Intuitively, there are some disadvantages to this approach due to the large size of the generated summaries. We show experimentally, that these disadvantages can easily be overcome due to the sparse nature of the generated summaries by simple compression techniques.
In peer-to-peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of every single participant. Efficient similarity search is generally recognized as a frontier in research about P2P systems. One way to address this issue is using data source selection based approaches where peers summarize the data they contribute to the network, generating typically one summary per peer. When processing queries, these summaries are used to choose the peers (data sources) that are most likely to contribute to the query result. Only those data sources are contacted.There are several contributions of this article. We extend earlier work, adding a data source selection method for high-dimensional vector data, comparing different peer ranking schemes. Furthermore, we present two methods that use progressive stepwise data exchange between peers to better each peer's summary and therefore improve the system's performance. We finally examine the effect of these data exchange methods with respect to load balancing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.