2017
DOI: 10.1007/s41019-017-0047-z
|View full text |Cite
|
Sign up to set email alerts
|

A Review of Scalable Bioinformatics Pipelines

Abstract: Scalability is increasingly important for bioinformatics analysis services, since these must handle larger datasets, more jobs, and more users. The pipelines used to implement analyses must therefore scale with respect to the resources on a single compute node, the number of nodes on a cluster, and also to cost-performance. Here, we survey several scalable bioinformatics pipelines and compare their design and their use of underlying frameworks and infrastructures. We also discuss current trends for bioinformat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 30 publications
(24 citation statements)
references
References 27 publications
0
24
0
Order By: Relevance
“…In line with current international efforts of standardizing workflow descriptions (11), analysis workflows in Trecode are written using WDL (7) and are executed by the Cromwell workflow executer (11). When generating workflow code, our emphasis is on reuse, which has resulted in a compact non-redundant and well documented code base which is easy to maintain, extend and reuse.…”
Section: Discussionmentioning
confidence: 99%
“…In line with current international efforts of standardizing workflow descriptions (11), analysis workflows in Trecode are written using WDL (7) and are executed by the Cromwell workflow executer (11). When generating workflow code, our emphasis is on reuse, which has resulted in a compact non-redundant and well documented code base which is easy to maintain, extend and reuse.…”
Section: Discussionmentioning
confidence: 99%
“…One of the notable attributes of the popular BLAST search is that it scales with the number of CPU cores [51]. As a result, to present NORTH as an alternative to BLAST-based approaches, we propose a scalable implementation of NORTH, which will aid clustering of plethora of genes.…”
Section: Scalabilitymentioning
confidence: 99%
“…The C3PO MUSC Transdisciplinary Collaborative Center system ingests clinical data from REDCap [36] for the project and integrates it into the OMOP model in its Spark/Hadoop framework. Since, C3PO was developed so it can generalize to other data types such as genomic and imaging, Spark/Hadoop frameworks [37][38][39][40] for genomic and imaging can be integrated in future versions of the system.…”
Section: Generalizabilitymentioning
confidence: 99%