We study the problem of matching a string in a labeled graph. Previous research has shown that unless the Orthogonal Vectors Hypothesis (OVH) is false, one cannot solve this problem in strongly sub-quadratic time, nor index the graph in polynomial time to answer queries efficiently (Equi et al. ICALP 2019, SOFSEM 2021). These conditional lower-bounds cover even deterministic graphs with binary alphabet, but there naturally exist also graph classes that are easy to index: For example, Wheeler graphs (Gagie et al. Theor. Comp. Sci. 2017) cover graphs admitting a Burrows-Wheeler transform -based indexing scheme. However, it is NP-complete to recognize if a graph is a Wheeler graph (Gibney, Thankachan, ESA 2019). We propose an approach to alleviate the construction bottleneck of Wheeler graphs. Rather than starting from an arbitrary graph, we study graphs induced from multiple sequence alignments (). Elastic degenerate strings (Bernadini et al. SPIRE 2017, ICALP 2019) can be seen as such graphs, and we introduce here their generalization: elastic founder graphs. We first prove that even such induced graphs are hard to index under OVH. Then we introduce two subclasses, repeat-free and semi-repeat-free graphs, that are easy to index. We give a linear time algorithm to construct a repeat-free (non-elastic) founder graph from a gapless , and (parameterized) near-linear time algorithms to construct a semi-repeat-free (repeat-free, respectively) elastic founder graph from general . Finally, we show that repeat-free founder graphs admit a reduction to Wheeler graphs in polynomial time.
We consider the following string matching problem on a node-labeled graph G = (V, E): given a pattern string P , decide whether there exists a path in G whose concatenation of node labels equals P . This is a basic primitive in various problems in bioinformatics, graph databases, or networks. The hardness results of Backurs and Indyk (FOCS 2016) imply that this problem cannot be solved in better than O(|E||P |) time, under the Orthogonal Vectors Hypothesis (OVH), and this holds even under various restrictions on the graph (Equi et al., ICALP 2019).In this paper we consider its offline version, namely the one in which we are allowed to index the graph in order to support time-efficient string matching queries. In fact, this version is the one relevant in practical application such as the ones mentioned above. While the online version has been believed to be hard even before the above-mentioned hardness results, it was tantalizing in the string matching community to believe that the offline version can allow for sub-quadratic time queries, e.g. at the cost of a high-degree polynomial-time indexing.We disprove this belief, showing that, under OVH, no polynomial-time indexing scheme of the graph can support querying P in time O(|E| δ |P | β ), with either δ < 1 or β < 1. We prove this tight bound employing a known self-reducibility technique, e.g. from the field of dynamic algorithms, which translates conditional lower bounds for an online problem to its offline version.As a side-contribution, we formalize this technique with the notion of linear independentcomponents reduction, allowing for a simple proof of our result. As another illustration that hardness of indexing follows as a corollary of a linear independent-components reduction, we also translate the quadratic conditional lower bound of Backurs and Indyk (STOC 2015) for the problem of matching a query string inside a text, under edit distance. We obtain an analogous tight quadratic lower bound for its offline version, improving the recent result of Cohen-Addad, Feuilloley and Starikovskaya (SODA 2019), but with a slightly different boundary condition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.