2014
DOI: 10.1093/bioinformatics/btu343
|View full text |Cite
|
Sign up to set email alerts
|

SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision

Abstract: Available under open source Apache 2.0 license: https://bitbucket.org/mwiewiorka/sparkseq/.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
44
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 98 publications
(44 citation statements)
references
References 8 publications
0
44
0
Order By: Relevance
“…The paper is also valuable for new developments using Spark: it explains how Figure 6: Modified S-Chemo core with support for parametrization and for "batch", "onestage", and "two-stage" predictors each Spark feature can be used to speed up their algorithm. Next, SparkSeq [25] is a platform for analysing sequencing data using Spark. The authors state that its main benefits are keeping the datasets in memory instead of on disk and that it enables interactive analysis.…”
Section: Related Workmentioning
confidence: 99%
“…The paper is also valuable for new developments using Spark: it explains how Figure 6: Modified S-Chemo core with support for parametrization and for "batch", "onestage", and "two-stage" predictors each Spark feature can be used to speed up their algorithm. Next, SparkSeq [25] is a platform for analysing sequencing data using Spark. The authors state that its main benefits are keeping the datasets in memory instead of on disk and that it enables interactive analysis.…”
Section: Related Workmentioning
confidence: 99%
“…It distributes massive data collections across multiple nodes within a cluster of commodity servers, Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it does not do distributed storage. To study the utility of Apache Spark in the genomic context, SparkSeq was created [72]. It is a general-purpose, flexible, and easily extendable library for genomic cloud computing, and can be used to build genomic analysis pipelines in Scala and run them in an interactive way.…”
Section: Most Bioinformatics Tools Are Not Cloud-awarementioning
confidence: 99%
“…A generalization of the MapReduce model, Spark [16] has enabled the design of many complex cloud based applications and demonstrated very good performance numbers [17], [18], [19], [20]. To achieve such performance, the Spark runtime relies on an innovative data structure, called Resilient Distributed Dataset (RDD) [16], which is used to store distributed data collections with the support of parallel access and fault-tolerance.…”
Section: Introductionmentioning
confidence: 99%