2016
DOI: 10.1093/bioinformatics/btw266
|View full text |Cite
|
Sign up to set email alerts
|

deBWT: parallel construction of Burrows–Wheeler Transform for large collection of genomes with de Bruijn-branch encoding

Abstract: Motivation: With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows–Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for large collection of genomes, especially for highly similar or repetitive genomes. Moreover, the state-of-the-art approach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 31 publications
0
6
0
Order By: Relevance
“…A straightforward way to represent a pangenome is to store unaligned genomes in a full-text index that compresses redundancies in sequences identical between individuals [8][9][10]. We may retrieve individual genomes from the index, inspect the k-mer spectrum and test the presence of k-mers using standard techniques.…”
Section: Introductionmentioning
confidence: 99%
“…A straightforward way to represent a pangenome is to store unaligned genomes in a full-text index that compresses redundancies in sequences identical between individuals [8][9][10]. We may retrieve individual genomes from the index, inspect the k-mer spectrum and test the presence of k-mers using standard techniques.…”
Section: Introductionmentioning
confidence: 99%
“…In the configuration stage, programmers can easily specify the basic FindeR parameters, e.g., the BWT and FM-Index files, alphabet, FM-Index bucket width, bank number and RHU number, in the configuration file. We assume the BWT construction of the reference genomes and read pools are done in the cloud [56], [57], so that we can perform trillions of backward searches on them during all steps of genome analysis. At the beginning of compiling, the files of the BWT and FM-Index are copied into ReRAM chips and the other parameters are written into the SMC on the NVDIMM.…”
Section: ) System Supportmentioning
confidence: 99%
“…Schemes Based on Burrows-Wheeler Transform Various works incorporate the Burrows-Wheeler Transform for more space efficiency [37,196,304,390] 4.3.2 Grammar-and Text-Related Works. Peshkin [367] uses the notions from both graph grammars and graph compression to understand the structure of DNA and simultaneously be able to represent it compactly.…”
Section: Schemes Based On De Bruijn Graphs De Bruijn Graphmentioning
confidence: 99%