2019
DOI: 10.1186/s13015-019-0148-5
|View full text |Cite
|
Sign up to set email alerts
|

Prefix-free parsing for building big BWTs

Abstract: High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive—a characteristic that can be exploited to ease the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessin… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
89
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1

Relationship

3
3

Authors

Journals

citations
Cited by 58 publications
(89 citation statements)
references
References 18 publications
0
89
0
Order By: Relevance
“…Here, we describe our algorithm for building the SA or the sampled SA from the prefix free parse of a input string S, which is used to build the r-index. We first review the algorithm from [2] for building the BWT of S from the prefix free parse. Next, we show how to modify this construction to compute the SA or the sampled SA along with the BWT.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Here, we describe our algorithm for building the SA or the sampled SA from the prefix free parse of a input string S, which is used to build the r-index. We first review the algorithm from [2] for building the BWT of S from the prefix free parse. Next, we show how to modify this construction to compute the SA or the sampled SA along with the BWT.…”
Section: Methodsmentioning
confidence: 99%
“…It takes as input string S, and in one-pass generates a dictionary and a parse of S with the property that the BWT can be constructed from dictionary and parse using workspace proportional to their total size and O(|S|) time. Yet, the resulting index of Boucher et al [2] has no SA sample, and therefore, only supports counting and not locating. This makes this index not directly applicable to many bioinformatic applications, such as sequence alignment.…”
Section: Introductionmentioning
confidence: 91%
See 1 more Smart Citation
“…Users should first download some prerequisite packages, and the source code from the github repos- These commands will install the binaries ri-buildfasta and ri-align in the system's default bin location (e.g., /usr/local/bin for Ubuntu users), together with bigbwt [1] and the SDSL library [5] (if it is not already present). If users want the binaries elsewhere, then they should use $ cmake -DCMAKE_INSTALL_PREFIX=<dest> ..…”
Section: Installationmentioning
confidence: 99%
“…Building on previous authors' work [11], Gagie, Navarro and Prezza [4] described how a fully functional variant of the FM-index for such a database could be stored in reasonable space: their variant takes O(r) machine words, where r is the number of runs in the BWT of the database, and thus is called the r-index. Prezza [14] gave a preliminary implementation, which was significantly extended by Boucher et al [1] and Kuhnle et al [6]. This paper is meant as a brief guide to the extended implementation.…”
Section: Introductionmentioning
confidence: 99%