2017
DOI: 10.1007/978-3-319-67428-5_14
|View full text |Cite
|
Sign up to set email alerts
|

Fast Label Extraction in the CDAWG

Abstract: The compact directed acyclic word graph (CDAWG) of a string T of length n takes space proportional just to the number e of right extensions of the maximal repeats of T , and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which e grows significantly more slowly than n. We reduce from O(m log log n) to O(m) the time needed to count the number of occurrences of a pattern of length m, using an existing data structure that takes an amount of space … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
36
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(36 citation statements)
references
References 21 publications
0
36
0
Order By: Relevance
“…These indexes achieve optimaltime queries (i.e., asymptotically equal to those of suffix trees [50]) at the price of a space consumption higher than that of other compressed indexes. Namely, the former index [26] requires O(r lg(n/r)) words of space, r being the number of equal-letter runs in the BWT of S, while the latter [3] uses O(e) words, e being the size of the CDAWG of S. These two measures (especially e) have been experimentally confirmed to be not as small as others -such as the size of LZ77 -on repetitive collections [4].…”
Section: Introductionmentioning
confidence: 92%
“…These indexes achieve optimaltime queries (i.e., asymptotically equal to those of suffix trees [50]) at the price of a space consumption higher than that of other compressed indexes. Namely, the former index [26] requires O(r lg(n/r)) words of space, r being the number of equal-letter runs in the BWT of S, while the latter [3] uses O(e) words, e being the size of the CDAWG of S. These two measures (especially e) have been experimentally confirmed to be not as small as others -such as the size of LZ77 -on repetitive collections [4].…”
Section: Introductionmentioning
confidence: 92%
“…4. Self-indexes with efficient extraction require Ω(z log(n/z)) space [105,21,43,10,15], Ω(g) space [17,14], or Ω(e) space [111,7]. 5.…”
Section: Indexmentioning
confidence: 99%
“…3) O(r log(n/r)) O(m + occ) This paper (Thm. 4) O(rw log σ (n/r)) O(m log(σ)/w + occ) Belazzougui et al [6,Thm. 4] O(e) O(m log log n + occ) Takagi et al [ O(z log n) O( + log n) Rytter [88], Charikar et al [18] O(z log(n/z)) O( + log n) Bille et al [13,Lem.…”
Section: Index Spacementioning
confidence: 99%
“…Many proposals since then aimed at reducing the locating time by building on other measures related to repetitiveness: indexes based on the Lempel-Ziv parse [61] of T , with size bounded in terms of the number z of phrases [58,36,79,6]; indexes based on the smallest context-free grammar [18] that generates T , with size bounded in terms of the size g of the grammar [22,23,35]; and indexes based on the size e of the smallest automaton (CDAWG) [16] recognizing the substrings of T [6,94,4]. The achievements are summarized in Table 1; note that none of those later approaches is able to count the occurrences without enumerating them all.…”
Section: Introductionmentioning
confidence: 99%