Wu consider tho problem of estimating CPU (distance computntlons) nnd I/O costs for processing range and k-nearest neighbors qucrics over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performanco of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method [lo]. Our results rely on the assumption that the indexed dataset comes from a metric space which is "homogeneous" enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the modol ovor both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimlzo a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the up-tree index structure [8].
Skyline queries compute the set of Pareto-optimal tuples in a relation, that is, those tuples that are not
dominated
by any other tuple in the same relation. Although several algorithms have been proposed for efficiently evaluating skyline queries, they either necessitate the relation to have been indexed or have to perform the dominance tests on
all
the tuples in order to determine the result. In this article we introduce salsa, a novel skyline algorithm that exploits the idea of presorting the input data so as to effectively
limit
the number of tuples to be read and compared. This makes salsa also attractive when skyline queries are executed on top of systems that do not understand skyline semantics, or when the skyline logic runs on clients with limited power and/or bandwidth. We prove that, if one considers symmetric sorting functions, the number of tuples to be read is minimized by sorting data according to a “minimum coordinate,” minC, criterion, and that performance can be further improved if data distribution is known and an asymmetric sorting function is used. Experimental results obtained on synthetic and real datasets show that salsa consistently outperforms state-of-the-art sequential skyline algorithms and that its performance can be accurately predicted.
Effective and efficient retrieval of similar shapes from large image databases is still a challenging problem in spite of the high relevance that shape information can have in describing image contents. In this paper, we propose a novel Fourier-based approach, called WARP, for matching and retrieving similar shapes. The unique characteristics of WARP are the exploitation of the phase of Fourier coefficients and the use of the Dynamic Time Warping (DTW) distance to compare shape descriptors. While phase information provides a more accurate description of object boundaries than using only the amplitude of Fourier coefficients, the DTW distance permits us to accurately match images even in the presence of (limited) phase shiftings. In terms of classical precision/recall measures, we experimentally demonstrate that WARP can gain, say, up to 35 percent in precision at a 20 percent recall level with respect to Fourier-based techniques that use neither phase nor DTW distance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.