Soufyane El Allali scite author profile

Henrich

et al. 2006

In peer-to-peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of every single participant. Efficient similarity search is generally recognized as a frontier in research about P2P systems. One way to address it is using data source selection based approaches where peers summarize the data they contribute to the network, generating typically one summary per peer. When processing queries, these summaries are used to choose the peers (data sources) that are most likely to contribute to the query result. Only those data sources are contacted. There are two main contributions of this paper. We extend earlier work, adding a data source selection method for high-dimensional vector data, comparing different peer ranking schemes. More importantly, we present a method that uses progressive stepwise data exchange between peers to better each peer's summary and therefore improve the system's performance. 1

Image Data Source Selection Using Gaussian Mixture Models

Allali

et al. 2008

Hunt the Cluster: A Scalable, Interactive Time Bayesian Image Browser for P2P Networks

Allali²,

et al. 2007

In Peer-to-Peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of each single participant. Among other applications, fielded P2P networks have shown their viability for the distribution and exchange of large amounts of data.Current research on retrieval in P2P systems focuses largely on keyword-based retrieval and other weakly interactive query paradigms.Within this paper, we present a Bayesian image browser that helps the user in finding images in distributed collections. Bayesian image browsers operate by presenting sequences of thumbnail-based collections to the user, at each step collecting user feedback that is used to update a Bayesian model. In contrast to query by example (QBE) no initial query image is needed in order to start the query process.Our approach is scalable in that the state of the Bayesian model is maintained locally in the browsing peer and only a small number of thumbnails is requested from the network at each step. Each query step is thus done in a short time frame.In this paper we present the method, as well as first experiments done using a JXTA-based implementation.

Sample-based creation of peer summaries for efficient similarity search in scalable peer-to-peer networks

Allali

Mueller

et al. 2007

In this paper we introduce a simple yet experimentally convincing approach in the research field of source selection for content-based similarity search in P2P networks or, more concretely, in summary-based P2P systems. In these systems, summaries are used for data source selection when performing k-NN queries on distributed collections of documents represented by feature vectors.We introduce a new type of cluster-based summaries for source selection that can efficiently and cheaply be calculated and distributed in P2P networks. For the summaries generation, a very large number of sample points is used. Each peer in the network assigns its indexing data to their corresponding closest sample points and publishes its constructed summary. We evaluate the quality of these summaries when changing the number of sample points used in experiments on real-world image feature data obtained from a large crawl of the flickr web photo community and show that for higher numbers of sample points we achieve a better retrieval performance. Our experiments show that the proposed summaries yield four times better performance with respect to previous methods. Intuitively, there are some disadvantages to this approach due to the large size of the generated summaries. We show experimentally, that these disadvantages can easily be overcome due to the sparse nature of the generated summaries by simple compression techniques.

Clustering-Based, Load Balanced Source Selection for Cbir in P2p Networks

Eisenhardt

Int. J. Semantic Computing

et al. 2008

In peer-to-peer (P2P) networks, computers with equal rights form a logical (overlay) network in order to provide a common service that lies beyond the capacity of every single participant. Efficient similarity search is generally recognized as a frontier in research about P2P systems. One way to address this issue is using data source selection based approaches where peers summarize the data they contribute to the network, generating typically one summary per peer. When processing queries, these summaries are used to choose the peers (data sources) that are most likely to contribute to the query result. Only those data sources are contacted.There are several contributions of this article. We extend earlier work, adding a data source selection method for high-dimensional vector data, comparing different peer ranking schemes. Furthermore, we present two methods that use progressive stepwise data exchange between peers to better each peer's summary and therefore improve the system's performance. We finally examine the effect of these data exchange methods with respect to load balancing.