2015
DOI: 10.1007/978-3-319-26989-4_11
|View full text |Cite
|
Sign up to set email alerts
|

Scalable Cloud-Based Data Analysis Software Systems for Big Data from Next Generation Sequencing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 51 publications
0
3
0
Order By: Relevance
“…Other features of the ecosystem include indexing and search capabilities similar to DataMed [ 51 ] and a metalearning framework for ranking and selection of the best predictive algorithms [ 52 ]. Many of the bioinformatics software tools that we have discussed in the previous section have been successfully deployed in cloud environments and can be adapted to the commons ecosystem, including Apache Spark, a successor to Apache Hadoop and MapReduce for data analysis of Next Generation Sequencing Data [ 53 ]. In addition, the data transfer and sharing component of the cloud-based commons ecosystem can include features discussed for the Globus Research Data Management Platform [ 54 ].…”
Section: Developing a Cloud-based Digital Ecosystem For Biomedical Rementioning
confidence: 99%
“…Other features of the ecosystem include indexing and search capabilities similar to DataMed [ 51 ] and a metalearning framework for ranking and selection of the best predictive algorithms [ 52 ]. Many of the bioinformatics software tools that we have discussed in the previous section have been successfully deployed in cloud environments and can be adapted to the commons ecosystem, including Apache Spark, a successor to Apache Hadoop and MapReduce for data analysis of Next Generation Sequencing Data [ 53 ]. In addition, the data transfer and sharing component of the cloud-based commons ecosystem can include features discussed for the Globus Research Data Management Platform [ 54 ].…”
Section: Developing a Cloud-based Digital Ecosystem For Biomedical Rementioning
confidence: 99%
“…We have decided to further improve the performance by implementing this algorithm in a distributed framework using Spark. Apache Spark is a new framework for distributed parallel computation can speed up the iterative applications, such as machine learning, when the data is cached in memory [19]. In this research work, we attempt to address this issue by training multiple models using Spark, and combining their output results.…”
Section: Accelerating the Classifier -Sparkmentioning
confidence: 99%
“…Advanced big data analytics frameworks accelerate the storage and analysis of big omics data by facilitating the provision of scalable analytic infrastructures, such as the Hadoop Distributed File System (HDFS) for storage and the Spark Machine Learning libraries (MLlib) for analysis. 1 So as to cater advanced bio-data analytics, big data and cloud computing technologies need to be tightly integrated and applied in a uniform fashion. Cloud computing has been demonstrated to be reliably scalable for the analysis of genomic data over single machines, as well as clusters and public cloud infrastructures.…”
Section: Introductionmentioning
confidence: 99%