2015
DOI: 10.1093/bioinformatics/btv179
|View full text |Cite
|
Sign up to set email alerts
|

Halvade: scalable sequence analysis with MapReduce

Abstract: Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine.Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
58
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 74 publications
(58 citation statements)
references
References 18 publications
0
58
0
Order By: Relevance
“…In addition to optimizations of the traditional variant calling algorithms [10–13], the community also has been calling for a variant calling toolkit that can take advantage of dedicated MapReduce platforms, as Hadoop [23] and especially Spark [24–26] are more appropriate for this type of genomic data analysis compared to traditional high performance computing (HPC). Thus GATK4, first officially released in January of 2018, is meant to be eventually deployed on data analytics platforms.…”
Section: Introductionmentioning
confidence: 99%
“…In addition to optimizations of the traditional variant calling algorithms [10–13], the community also has been calling for a variant calling toolkit that can take advantage of dedicated MapReduce platforms, as Hadoop [23] and especially Spark [24–26] are more appropriate for this type of genomic data analysis compared to traditional high performance computing (HPC). Thus GATK4, first officially released in January of 2018, is meant to be eventually deployed on data analytics platforms.…”
Section: Introductionmentioning
confidence: 99%
“…elPrep 4 achieves its speedups while offering the flexibility to freely plug pipeline steps in or out, and producing the same results as reference implementations of these steps in GATK 4, Picard, and SAMtools. elPrep 4 works with community-defined standards such as SAM/BAM/VCF/BED rather than defining its own formats for achieving its speedups, making elPrep 4 (backwards) compatible with other standard tools and workflows [7, 23, 24]. …”
Section: Discussionmentioning
confidence: 99%
“…Many sequence aligners which use big data technologies like Apache Hadoop and Spark were implemented in last few years. CloudBurst [22], CloudAligner [23], Halvade [27], SEAL [33], BigBWA [25] and SparkBWA [26] are mostly used sequence aligners which use big data technologies.…”
Section: Related Workmentioning
confidence: 99%
“…Sequence alignment tools like BigBWA [25], Halvade [27] and SparkBWA [26] are very accurate but they suffer from high time/space complexity for index generation.…”
mentioning
confidence: 99%