We propose a family of algorithms for processing nearest neighbor (NN) queries in an integration middleware that provides federated access to numerous loosely coupled, autonomous data sources connected through the internet. Previous approaches for parallel and distributed NN queries considered all data sources as relevant, or determined the relevant ones in a single step by exploiting additional knowledge on object counts per data source. We propose a different approach that does not require such detailed statistics about the distribution of the data. It iteratively enlarges and shrinks the set of relevant data sources. Our experiments show that this yields considerable performance benefits with regard to both response time and effort. Additionally, we propose to use only moderate parallelism instead of querying all relevant data sources at the same time. This allows us to trade a slightly increased response time for a lot less effort, hence maximizing the cost profit ratio, as we show in our experiments. Thus, the proposed algorithms clearly extend the set of NN algorithms known so far.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.