Classic similarity measures of strings are longest common subsequence and Levenshtein distance (i.e., the classic edit distance). A classic similarity measure of curves is dynamic time warping. These measures can be computed by simple O(n 2 ) dynamic programming algorithms, and despite much effort no algorithms with significantly better running time are known.We prove that, even restricted to binary strings or one-dimensional curves, respectively, these measures do not have strongly subquadratic time algorithms, i.e., no algorithms with running time O(n 2−ε ) for any ε > 0, unless the Strong Exponential Time Hypothesis fails. We generalize the result to edit distance for arbitrary fixed costs of the four operations (deletion in one of the two strings, matching, substitution), by identifying trivial cases that can be solved in constant time, and proving quadratic-time hardness on binary strings for all other cost choices. This improves and generalizes the known hardness result for Levenshtein distance [Backurs, Indyk STOC'15] by the restriction to binary strings and the generalization to arbitrary costs, and adds important problems to a recent line of research showing conditional lower bounds for a growing number of quadratic time problems.As our main technical contribution, we introduce a framework for proving quadratic-time hardness of similarity measures. To apply the framework it suffices to construct a single gadget, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability.Finally, we prove quadratic-time hardness for longest palindromic subsequence and longest tandem subsequence via reductions from longest common subsequence, showing that conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.
Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size n of data that originally has size N , and we want to solve a problem with time complexity T (·). The naïve strategy of "decompress-and-solve" gives time T (N ), whereas "the gold standard" is time T (n): to analyze the compression as efficiently as if the original data was small.We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar-Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design.We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: • The O(nN log N/n) bound for LCS and the O(min{N log N, nM }) bound for Pattern Matching with Wildcards are optimal up to N o(1) factors, under the Strong Exponential Time Hypothesis. (Here, M denotes the uncompressed length of the compressed pattern.)• Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the k-Clique conjecture.• We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.
We empirically analyze two versions of the well-known "randomized rumor spreading" protocol to disseminate a piece of information in networks. In the classical model, in each round, each informed node informs a random neighbor. In the recently proposed quasirandom variant, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list.While for sparse random graphs a better performance of the quasirandom model could be proven, all other results show that, independent of the structure of the lists, the same asymptotic performance guarantees hold as for the classical model.In this work, we compare the two models experimentally. Not only does this show that the quasirandom model generally is faster, but it also shows that the runtime is more concentrated around the mean. This is surprising given that much fewer random bits are used in the quasirandom process.These advantages are also observed in a lossy communication model, where each transmission does not reach its target with a certain probability, and in an asynchronous model, where nodes send at random times drawn from an exponential distribution. We also show that typically the particular structure of the lists has little influence on the efficiency.
We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings x and y of length n, a textbook algorithm solves LCS in time O(n 2 ), but although much effort has been spent, no O(n 2−ε )-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams FOCS'15; Bringmann, Künnemann FOCS'15].Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known.To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size n := max{|x|, |y|}, the length of the shorter string m := min{|x|, |y|}, the length L of an LCS of x and y, the numbers of deletions δ := m − L and ∆ := n − L, the alphabet size, as well as the numbers of matching pairs M and dominant pairs d. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms (up to lower order factors of the form n o(1) ). Specifically, we determine the optimal running time for LCS under SETH as (n + min{d, δ∆, δm}) 1±o(1) . Polynomial improvements over this running time must necessarily refute SETH or exploit novel input parameters. We establish the same lower bound for any constant alphabet of size at least 3. For binary alphabet, we show a SETH-based lower bound of (n + min{d, δ∆, δM/n}) 1−o(1) and, motivated by difficulties to improve this lower bound, we design an O(n+δM/n)-time algorithm, yielding again a matching bound.We feel that our systematic approach yields a comprehensive perspective on the well-studied multivariate complexity of LCS, and we hope to inspire similar studies of multivariate complexity landscapes for further polynomialtime problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.