Given two geographic databases, a fusion algorithm should produce all pairs of corresponding objects (i.e., objects that represent the same real-world entity). Four fusion algorithms, which only use locations of objects, are described and their performance is measured in terms of recall and precision. These algorithms are designed to work even when locations are imprecise and each database represents only some of the real-world entities. Results of extensive experimentation are presented and discussed. The tests show that the performance depends on the density of the data sources and the degree of overlap among them. All four algorithms are much better than the current state of the art (i.e., the onesided nearest-neighbor join). One of these four algorithms is best in all cases, at a cost of a small increase in the running time compared to the other algorithms.
When integrating geo-spatial data sets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real-world entity). This article investigates location-based join algorithms for integration of several data sets. First, algorithms for integration of two data sets are presented and their performances, in terms of recall and precision, are compared. Then, two approaches for integration of more than two data sets are described. In one approach, all the integrated data sets are processed simultaneously. In the second approach, a join algorithm for two data sets is applied sequentially, either in a serial manner, where in each join at least one of the joined data sets is a single source, or in a hierarchical manner, where two join results can be joined. For the two approaches, join algorithms are given. The algorithms are designed to perform well even when location of objects are imprecise and each data set represents only some of the real-world entities. Results of extensive experiments with the different approaches are provided and analyzed. The experiments show the differences, in accuracy and efficiency, between the approaches, under different circumstances. The results also show that all our algorithms have much better accuracy than applying the commonly used one-sided nearest-neighbor join.
In a geographical route search, given search terms, the goal is to find an effective route that (1) starts at a given location, (2) ends at a given location, and (3) travels via geographical entities that are relevant to the given terms. A route is effective if it does not exceed a given distance limit whereas the ranking scores of the visited entities, with respect to the search terms, are maximal. This paper introduces route-search queries, suggests three semantics for such queries and deals with the problem of efficiently answering queries under the different semantics. Since the problem of answering route-search queries is a generalization of the traveling salesman problem, it is unlikely to have an efficient solution, i.e., there is no polynomial-time algorithm that solves the problem (unless P=NP). Hence, in this work we consider heuristics for the problem. Methods for effectively computing routes are presented. The methods are compared analytically and experimentally. For these methods, experiments on both synthetic and real-world data illustrate their efficiency and their effectiveness in computing a route that satisfies the constraints of a route-search query.
Abstract. In a route search over geospatial data, a user provides terms for specifying types of geographical entities that she wishes to visit. The goal is to find a route that (1) starts at a given location, (2) ends at a given location, and (3) travels via geospatial entities that are relevant to the provided search terms. Earlier work studied the problem of finding a route that is effective in the sense that its length does not exceed a given limit, the relevancy of the objects is as high as possible, and the route visits a single object from each specified type. This paper investigates route search over probabilistic geospatial data. It is shown that the notion of an effective route requires a new definition and, specifically, two alternative semantics are proposed. Computing an effective route is more complicated, compared to the non-probabilistic case, and hence necessitates new algorithms. Heuristic methods for computing an effective route, under either one of the two semantics, are developed. (Note that the problem is NP-hard.) These methods are compared analytically and experimentally. In particular, experiments on both synthetic and realworld data illustrate the efficiency and effectiveness of these methods in computing a route under the two semantics.
When integrating geo-spatial datasets, a join algorithm is used for finding sets of corresponding objects (i.e., objects that represent the same real-world entity). Algorithms for joining two datasets were studied in the past. This paper investigates integration of three datasets and proposes methods that can be easily generalized to any number of datasets. Two approaches that use only locations of objects are presented and compared. In one approach, a join algorithm for two datasets is applied sequentially. In the second approach, all the integrated datasets are processed simultaneously. For the two approaches, join algorithms are given and their performances, in terms of recall and precision, are compared. The algorithms are designed to perform well even when locations are imprecise and each dataset represents only some of the real-world entities. Results of extensive experiments show that one of the algorithms has the best (or close to the best) performances under all circumstances. This algorithm has a much better performance than applying sequentially the one-sided nearest-neighbor join.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.