A distance-mapping algorithm takes a set of objects and a distance metric and then maps those objects to a Euclidean or pseudoEuclidean space in such a way that the distances among objects are approximately preserved. Distance mapping algorithms are a useful tool for clustering and visualization in data intensive applications, because they replace expensive distance calculations by sum-of-square calculations.This can make clustering in large databases with expensive distance metrics practical.In this paper we present five distance-mapping algorithms and conduct experiments to compare their performance in data clustering applications.These include two algorithms called FastMap and MetricMap, and three hybrid heuristics that combine the two algorithms in different ways. Experimental results on both synthetic and RNA data show the superiority of the hybrid algorithms. The results imply that FastMap and MetricMap capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations may be done in minutes. *
We introduce ImageMap, as a method for indexing and similarity searching in Image DataBases (IDBs). ImageMap answers "queries by example," involving any number of objects or regions and taking into account their interrelationships. We adopt the most general image content representation, that is, Attributed Relational Graphs (ARGs), in conjunction with the well-accepted ARG editing distance on ARGs. We tested ImageMap on real and realistic medical images. Our method not only provides for visualization of the data set, clustering and data mining, but it also achieves up to 1,000-fold speed-up in search over sequential scanning, with zero or very few false dismissals.
Skyline queries return a set of interesting data points that are not dominated on all dimensions by any other point. Most of the existing algorithms focus on skyline computation in centralized databases, and some of them can progressively return skyline points upon identification rather than all in a batch. Processing skyline queries over the Web is a more challenging task because in many Web applications, the target attributes are stored at different sites and can only be accessed through restricted external interfaces. In this paper, we develop PDS (progressive distributed skylining), a progressive algorithm that evaluates skyline queries efficiently in this setting. The algorithm is also able to estimate the percentage of skyline objects already retrieved, which is useful for users to monitor the progress of long running skyline queries. Our performance study shows that PDS is efficient and robust to different data distributions and achieves its progressive goal with a minimal overhead.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.