A distance-mapping algorithm takes a set of objects and a distance metric and then maps those objects to a Euclidean or pseudoEuclidean space in such a way that the distances among objects are approximately preserved. Distance mapping algorithms are a useful tool for clustering and visualization in data intensive applications, because they replace expensive distance calculations by sum-of-square calculations.This can make clustering in large databases with expensive distance metrics practical.In this paper we present five distance-mapping algorithms and conduct experiments to compare their performance in data clustering applications.These include two algorithms called FastMap and MetricMap, and three hybrid heuristics that combine the two algorithms in different ways. Experimental results on both synthetic and RNA data show the superiority of the hybrid algorithms. The results imply that FastMap and MetricMap capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations may be done in minutes. *
We consider the problem of comparing CUAL graphs (Connected, Undirected, Acyclic graphs with nodes being Labeled). This problem is motivated by the study of information retrieval for bio-chemical and molecular databases. Suppose we define the distance between two CUAL graphs G1 and G2 to be the weighted number of edit operations (insert node, delete node and relabel node) to transform G1 to G2. By reduction from exact cover by 3-sets, one can show that finding the distance between two CUAL graphs is NP-complete. In view of the hardness of the problem, we propose a constrained distance metric, called the degree-2 distance, by requiring that any node to be inserted (deleted) have no more than 2 neighbors. With this metric, we present an efficient algorithm to solve the problem. The algorithm runs in time O(N1N2D2) for general weighting edit operations and in time [Formula: see text] for integral weighting edit operations, where Ni, i=1, 2, is the number of nodes in Gi, D=min{d1, d2} and di is the maximum degree of Gi.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.