Proceedings of the 2017 ACM International Conference on Management of Data 2017
DOI: 10.1145/3035918.3064048
|View full text |Cite
|
Sign up to set email alerts
|

Massively Parallel Processing of Whole Genome Sequence Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 26 publications
0
14
0
Order By: Relevance
“…In addition to enabling and evaluating horizontal scalability, the cost of an analysis and the choice of virtual machine flavors are becoming increasingly important for efficient execution of bioinformatics analysis, since pipelines are increasingly deployed and evaluated on commercial clouds [6,21,22]. However, even on dedicated clusters it is important to understand how to scale a pipeline up and out on the available resources to improve the utilization of the resources.…”
Section: Summary and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition to enabling and evaluating horizontal scalability, the cost of an analysis and the choice of virtual machine flavors are becoming increasingly important for efficient execution of bioinformatics analysis, since pipelines are increasingly deployed and evaluated on commercial clouds [6,21,22]. However, even on dedicated clusters it is important to understand how to scale a pipeline up and out on the available resources to improve the utilization of the resources.…”
Section: Summary and Discussionmentioning
confidence: 99%
“…GESALL [21] is a genomic analysis platform for unmodified analysis tools that use the POSIX file system interface. An example pipeline implemented with GESALL is their implementation of the GATK variant calling reference pipeline that was used as an example in the ADAM paper [6].…”
Section: Gesall Variant Calling Pipelinementioning
confidence: 99%
“…The development of bioinformatics tools based on "Big-Data" technologies started in late 2008 with work on Hadoop [42], [43] and has since continued to increase [44]. Most of the advancement in this area have come in NGS data analysis, where many Hadoop-based tools have been developed [45], [46], [46]- [48] and, more recently, [13], [17]. On the other hand, there is very little on distributed streaming computing applied to the life-sciences, a part from general architecture descriptions such as [49].…”
Section: Related Workmentioning
confidence: 99%
“…Traditionally [12], the workflow steps are run on a conventional High-Performance Computing (HPC) infrastructure -a set of computing nodes accessed through a batch queuing system and equipped with a parallel shared storage system. While this is, of course, a working solution, it requires a non-trivial amount of ad-hoc manual intervention to efficiently use the available computational resources and obtain the fast turn-around times that are needed for diagnostic applications [13]. The main issues here are how to divide the work of a single job among all computing nodes and how to make the system robust to transient or permanent hardware or software failures.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation