2016
DOI: 10.1016/j.jda.2016.03.003
|View full text |Cite
|
Sign up to set email alerts
|

Lightweight LCP construction for very large collections of strings

Abstract: The longest common prefix array is a very advantageous data structure that, combined with the suffix array and the Burrows-Wheeler transform, allows to efficiently compute some combinatorial properties of a string useful in several applications, especially in biological contexts. Nowadays, the input data for many problems are big collections of strings, for instance the data coming from "next-generation" DNA sequencing (NGS) technologies. In this paper we present the first lightweight algorithm (called extLCP)… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
44
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 26 publications
(44 citation statements)
references
References 25 publications
0
44
0
Order By: Relevance
“…External memory LCP and BWT computation with applications n = Σ k h=1 n h . The multi-string BWT [10,25] of s 1 , . .…”
Section: :4mentioning
confidence: 99%
See 1 more Smart Citation
“…External memory LCP and BWT computation with applications n = Σ k h=1 n h . The multi-string BWT [10,25] of s 1 , . .…”
Section: :4mentioning
confidence: 99%
“…Nevertheless, the simplicity of the algorithm makes it very effective for collections of relatively short sequences, and this has become the reference tool for this problem. This approach was later extended [10] to compute also the LCP values with the same asymptotic number of I/Os. When computing also the LCP values, or when the input strings have different lengths, the algorithm uses O(m) words of RAM, where m is the number of input sequences.…”
Section: Introductionmentioning
confidence: 99%
“…The longest common prefix (LCP) array of the collection S [30,18,24] is the array lcp(S) of length N + 1, such that lcp(S)[i], with 2 ≤ i ≤ N , is the length of the longest common prefix between the suffixes associated to the positions i and i − 1 in ebwt(S) and lcp(S)[1] = lcp(S)[N + 1] = −1 set by default. We denote by LCP(i, j) the length of the LCP between the suffixes associated with positions i and j in ebwt(S), i.e.…”
Section: Preliminariesmentioning
confidence: 99%
“…The Burrows Wheeler transform (BWT), originally introduced as a tool for data compression [4], has found application in the compact representation of many different data structures. After the seminal works [31] showing that the BWT can be used as a compressed full text index for a single string, many researchers have proposed variants of this transformation for string collections [5,24], trees [9,10], graphs [3,27,35], and alignments [30,29]. See [13] for an attempt to provide a unified view of these variants.…”
Section: Introductionmentioning
confidence: 99%
“…Historically, the first of such generalizations is the circular BWT [24] considered in Section 6. Here we consider the generalization proposed in [5] which is the one most used in applications. Let t 0 [1, n 0 ] and t 1 [1, n 1 ] be such that t 0 [n 0 ] = $ 0 and t 1 [n 1 ] = $ 1 where $ 0 < $ 1 are two symbols not appearing elsewhere in t 0 and t 1 and smaller than any other symbol.…”
Section: Introductionmentioning
confidence: 99%