2014
DOI: 10.1093/bioinformatics/btu584
|View full text |Cite
|
Sign up to set email alerts
|

Merging of multi-string BWTs with applications

Abstract: We present a novel algorithm that merges multi-string BWTs in [Formula: see text] time where LCS is the length of their longest common substring between any of the inputs, and N is the total length of all inputs combined (number of symbols) using [Formula: see text] bits where F is the number of multi-string BWTs merged. This merged multi-string BWT is also shown to have a higher compressibility compared with the input multi-string BWTs separately. Additionally, we explore some uses of a merged multi-string BW… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 42 publications
(44 citation statements)
references
References 26 publications
0
44
0
Order By: Relevance
“…The msBWTs were constructed using a hybrid combination of ropeBWT (Li 2014) and the msBWT merge algorithm (Holt and McMillan 2014) as described at https://github.com/holtjma/msbwt/wiki. The msBWTs are a lossless compressed form of the raw, sequenced reads that can be efficiently queried to find number of occurrences of and/or the associated read fragments containing any specified subsequence or k-mer.…”
Section: Methodsmentioning
confidence: 99%
“…The msBWTs were constructed using a hybrid combination of ropeBWT (Li 2014) and the msBWT merge algorithm (Holt and McMillan 2014) as described at https://github.com/holtjma/msbwt/wiki. The msBWTs are a lossless compressed form of the raw, sequenced reads that can be efficiently queried to find number of occurrences of and/or the associated read fragments containing any specified subsequence or k-mer.…”
Section: Methodsmentioning
confidence: 99%
“…This is due to two factors: First, Zhang et al use LoRDEC version 0.8 with the default parameters, while we use version 0.9 with the parameters suggested for E. coli in the LoRDEC paper [6]. Second, Zhang et al use FMLRC version 0.1.2 and construct the BWT with msBWT [37], while we use version 1.0.0 and construct the BWT with RopeBWT2 [38] as recommended by the FMLRC documentation. Table 3 shows the results.…”
Section: Error Correctionmentioning
confidence: 99%
“…Holt and McMillan have recently extended a data structure for string compression, the multi-string Burrows–Wheeler transform (msBWT) (Bauer et al 2013), to next-generation sequencing data (Holt et al 2014). A msBWT is a compressed, indexed representation of raw, unaligned sequence reads which allows fast queries for specific sequences over very large datasets (Fig.…”
Section: Resources For Next-generation Sequencingmentioning
confidence: 99%