A common approach for dealing with large datasets is to stream over the input in one pass, and perform computations using sublinear resources. For truly massive datasets, however, even making a single pass over the data is prohibitive. Therefore, streaming computations must be distributed over many machines. In practice, obtaining significant speedups using distributed computation has numerous challenges including synchronization, load balancing, overcoming processor failures, and data distribution. Successful systems in practice such as Google's MapReduce and Apache's Hadoop address these problems by only allowing a certain class of highly distributable tasks defined by local computations that can be applied in any order to the input. The fundamental question that arises is: How does the class of computational tasks supported by these systems differ from the class for which streaming solutions exist? We introduce a simple algorithmic model for massive, unordered, distributed (mud) computation, as implemented by these systems. We show that in principle, mud algorithms are equivalent in power to symmetric streaming algorithms. More precisely, we show that any symmetric (order-invariant) function that can be computed by a streaming algorithm can also be computed by a mud algorithm, with comparable space and communication complexity. Our simulation uses Savitch's theorem and therefore has superpolynomial time complexity. We extend our simulation result to some natural classes of approximate and randomized streaming algorithms. We also give negative results, using communication complexity arguments to prove that extensions to private randomness, promise problems, and indeterminate functions are impossible. We also introduce an extension of the mud model to multiple keys and multiple rounds.
Given an n-vertex graph G, a drawing of G in the plane is a mapping of its vertices into points of the plane, and its edges into continuous curves, connecting the images of their endpoints. A crossing in such a drawing is a point where two such curves intersect. In the Minimum Crossing Number problem, the goal is to find a drawing of G with minimum number of crossings. The value of the optimal solution, denoted by OPT, is called the graph's crossing number. This is a very basic problem in topological graph theory, that has received a significant amount of attention, but is still poorly understood algorithmically. The best currently known efficient algorithm produces drawings with O(log 2 n)· (n + OPT) crossings on bounded-degree graphs, while only a constant factor hardness of approximation is known. A closely related problem is Minimum Planarization, in which the goal is to remove a minimum-cardinality subset of edges from G, such that the remaining graph is planar.Our main technical result establishes the following connection between the two problems: if we are given a solution of cost k to the Minimum Planarization problem on graph G, then we can efficiently find a drawing of G with at most poly(d) · k · (k + OPT) crossings, where d is the maximum degree in G. This result implies an O(n · poly(d) · log 3/2 n)-approximation for Minimum Crossing Number, as well as improved algorithms for special cases of the problem, such as, for example, k-apex and bounded-genus graphs.
The Gromov-Hausdorff (GH) distance is a natural way to measure distance between two metric spaces. We prove that it is NP-hard to approximate the Gromov-Hausdorff distance better than a factor of 3 for geodesic metrics on a pair of trees. We complement this result by providing a polynomial time O(min{n, √ rn})approximation algorithm for computing the GH distance between a pair of metric trees, where r is the ratio of the longest edge length in both trees to the shortest edge length. For metric trees with unit length edges, this yields an O( √ n)-approximation algorithm.The Gromov-Hausdorff distance (or GH distance for brevity) [10] is one of the most natural distance measures between metric spaces, and has been used, for example, for matching deformable shapes [3,15], and for analyzing hierarchical clustering trees [5]. Informally, the Gromov-Hausdorff distance measures the additive distortion suffered when mapping one metric space to another using a correspondence between their points. Multiple approaches have been proposed to estimate the Gromov-Hausdorff distance [3,14,15]. Despite much effort, the problem of computing, either exactly or approximately, GH distance has remained elusive. The problem is not known to be NP-hard, and computing the GH distance, even approximately, for graphic metrics 1 is at least as hard as the graph isomorphism problem. Indeed, the metrics for two graphs have GH distance 0 if and only if the two graphs are isomorphic. Motivated by this trivial hardness result, it is natural to ask whether GH distance becomes easier in more restrictive settings such as geodesic metrics over trees, where efficient algorithms are known for checking isomorphism [1].Related work. Most work on associating points between two metric spaces involves embedding a given high dimensional metric space into an infinite host space of lower dimensional metric spaces. However, there is some work on finding a bijection between points in two given finite metric spaces that minimizes typically multiplicative distortion of distances between points and their images, with some limited results on additive distortion. Kenyon et al. [13] give an optimal algorithm for minimizing the multiplicative distortion of a bijection between two equal-sized finite metric spaces, and a parameterized polynomial time algorithm that finds the optimal bijection between an arbitrary unweighted graph metric and a bounded-degree tree metric.Papadimitriou and Safra [17] show that it is NP-hard to approximate the multiplicative distortion of any bijection between two finite 3-dimensional point sets to within any additive constant or to a factor better than 3.Hall and Papadimitriou [11] discuss the additive distortion problem -given two equal-sized point sets S, T ⊂ R d , find the smallest ∆ such that there exists a bijection f :They show that it is NP-hard to approximate by a factor better than 3 in R 3 , and also give a 2-approximation for R 1 and a 5-approximation for the more general problem of embedding an arbitrary metric space onto R 1 . ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.