2012
DOI: 10.1093/bioinformatics/bts380
|View full text |Cite
|
Sign up to set email alerts
|

Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees

Abstract: Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in persona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 55 publications
0
3
0
Order By: Relevance
“…These often focus on rare k -mers e.g. for species identification [ 11 ]—significantly similar in spirit to oligonucleotide probes in DNA-microarrays [ 12 , 13 ]—or compute differences using [ 14 ] or Jaccard-distances [ 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…These often focus on rare k -mers e.g. for species identification [ 11 ]—significantly similar in spirit to oligonucleotide probes in DNA-microarrays [ 12 , 13 ]—or compute differences using [ 14 ] or Jaccard-distances [ 15 ].…”
Section: Introductionmentioning
confidence: 99%
“…In regions with genomic variation (e.g. those regions in which the investigator is usually most interested), maintaining good performance generally leads to lower sensitivity (Gontarz et al 2013;Mahmud et al 2012). In addition, the Burrows-Wheeler transform method is less flexible than hash based methods.…”
Section: Introductionmentioning
confidence: 99%
“…The amount of data produced by current high-throughput DNA sequencing machines such as Illumina HiSeq 2500, which can generate as much as 100Gb a day, demands enormous computational power for primary analysis tasks such as read mapping. Although a large body of literature is concerned with read mapping [21,20,19,33,14,11,26,12,42,27,1], most approaches map one read at a time. The order of mapping is arbitrary regardless of similarities between reads which might hint towards the same mapping location.…”
Section: Introductionmentioning
confidence: 99%