Dictionary matching and indexing with errors and don't cares

Cole, Richard; Gottlieb, Lee-Ad; Lewenstein, Moshe

doi:10.1145/1007352.1007374

Cited by 224 publications

(250 citation statements)

References 42 publications

Supporting

Mentioning

247

Contrasting

Order By: Relevance

“…The improvement can be also embedded in the extensions. Searching Substrings Internally in the Suffix Tree Cole et al [7], when proposing data structures for indexing a text w[1..n] with k mismatches, k errors and k wildcards, suggested the LCP data structure. The LCP data structure comes in two variants, rooted LCP and unrooted LCP.…”

Section: Arxiv:14067716v1 [Csds] 30 Jun 2014mentioning

confidence: 99%

Weighted Ancestors in Suffix Trees

Gawrychowski

Lewenstein

Nicholson

2014

Algorithms - ESA 2014

Self Cite

View full text Add to dashboard Cite

Abstract. The classical, ubiquitous, predecessor problem is to construct a data structure for a set of integers that supports fast predecessor queries. Its generalization to weighted trees, a.k.a. the weighted ancestor problem, has been extensively explored and successfully reduced to the predecessor problem. It is known that any solution for both problems with an input set from a polynomially bounded universe that preprocesses a weighted tree in O(n polylog(n)) space requires Ω(log log n) query time. Perhaps the most important and frequent application of the weighted ancestors problem is for suffix trees. It has been a long-standing open question whether the weighted ancestors problem has better bounds for suffix trees. We answer this question positively: we show that a suffix tree built for a text w[1..n] can be preprocessed using O(n) extra space, so that queries can be answered in O(1) time. Thus we improve the running times of several applications. Our improvement is based on a number of data structure tools and a periodicity-based insight into the combinatorial structure of a suffix tree.

show abstract

Section: Arxiv:14067716v1 [Csds] 30 Jun 2014mentioning

confidence: 99%

Weighted Ancestors in Suffix Trees

Gawrychowski

Lewenstein

Nicholson

2014

Algorithms - ESA 2014

Self Cite

View full text Add to dashboard Cite

show abstract

“…can be supported for a pattern p plus a time complexity equal to the size of the output. Using techniques presented in [22], the structure can be modified to solve the problem in O(nm log(nm) + n(c 1 log n) k+1 /k!) preprocessing time, and O(m + (c 2 log n) k log log n) query time (c 1 and c 2 are constants); this approach is worse than the trie approach for small values of .…”

Section: Index Structures For Indeterminate Stringsmentioning

confidence: 99%

String Data Structures for Computational Molecular Biology

Makris

Theodoridis

2010

Algorithms in Computational Molecular Biology

View full text Add to dashboard Cite

“…There exist some solutions avoiding the convolution method as well [18][19][20]. A number of solutions exist in the literature that consider the problem of text indexing with don't cares [21][22][23][24]. Notably, in the literature, the don't cares are also referred to as wildcards.…”

Section: Introductionmentioning

confidence: 99%

Indexing a sequence for mapping reads with a single mismatch

Crochemore

Langiu

Rahman

2014

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

International audienceMapping reads against a genome sequence is an interesting and useful problem in Computational Molecular Biology and Bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the Next Generation Sequencing (NGS). In the sequel we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in O(n log^(1+ε)n) time and space and can answer subsequent queries in O(m log log n + K) time. Here, n is the length of the sequence, m is the length of the read, 0 < ε < 1 and K is the optimal output size

show abstract

Dictionary matching and indexing with errors and don't cares

Cited by 224 publications

References 42 publications

Weighted Ancestors in Suffix Trees

Weighted Ancestors in Suffix Trees

String Data Structures for Computational Molecular Biology

Indexing a sequence for mapping reads with a single mismatch

Contact Info

Product

Resources

About