2019
DOI: 10.1007/978-3-030-17083-7_10
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Construction of a Complete Index for Pan-Genomics Read Alignment

Abstract: While short read aligners, which predominantly use the FM-index, are able to easily index one or a few human genomes, they do not scale well to indexing databases containing thousands of genomes. To understand why, it helps to examine the main components of the FM-index in more detail, which is a rank data structure over the Burrows-Wheeler Transform (BWT) of the string that will allow us to find the interval in the string's suffix array (SA) containing pointers to starting positions of occurrences of a given … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 36 publications
0
8
0
Order By: Relevance
“…• bigbwt: Use a so-called prefix-free parsing technique, which is shown to be useful to reduce the working space and at the same time accelerate BWT construction [14,16].…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…• bigbwt: Use a so-called prefix-free parsing technique, which is shown to be useful to reduce the working space and at the same time accelerate BWT construction [14,16].…”
Section: Resultsmentioning
confidence: 99%
“…Before that promise can be fulfilled, however, several obstacles must still be overcome: first, we need efficient algorithms to build RLBWTs and SA samples of genomic databases, which are the main components of r-indexes; second, we need an efficient way to update the r-index when we add a new genome to the database, because rebuilding it regularly will be prohibitively slow regardless of the algorithms we use; and third, as reads become longer and more likely to contain combinations of variation that we have seen before individually but not all together, we will need support for finding maximal exact matches between the read and the database. Boucher et al [14,15] and Kuhnle et al [16] have since made substantial progress on the first point, and in this paper we address the second one and give a theoretical solution to the third. As a by-product of making the r-index dynamic, we obtain an online algorithm for computing the LZ77 parse in space bounded in terms of the number of runs in the BWT.…”
Section: T T C a G A T T A A C A T T T G A T A A C A T G A T T A C A mentioning
confidence: 91%
See 1 more Smart Citation
“…Building on previous authors' work [11], Gagie, Navarro and Prezza [4] described how a fully functional variant of the FM-index for such a database could be stored in reasonable space: their variant takes O(r) machine words, where r is the number of runs in the BWT of the database, and thus is called the r-index. Prezza [14] gave a preliminary implementation, which was significantly extended by Boucher et al [1] and Kuhnle et al [6]. This paper is meant as a brief guide to the extended implementation.…”
Section: Introductionmentioning
confidence: 99%
“…There is a theoretical proposal for supporting fast locate() queries in space proportional to the size of the run-length encoded BWT (Gagie et al, 2018). While there has been some progress in building the proposed index for large datasets (Kuhnle et al, 2019), scaling it up to TOPMed scale is still an open problem.…”
mentioning
confidence: 99%