2014
DOI: 10.1155/2014/348725
|View full text |Cite
|
Sign up to set email alerts
|

Developing eThread Pipeline Using SAGA-Pilot Abstraction for Large-Scale Structural Bioinformatics

Abstract: While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have develo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2015
2015
2017
2017

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 34 publications
0
4
0
Order By: Relevance
“…Because of this, processing is increasingly moving to grid-and cloud-based distributed computing facilities. From genetic sequencing [11] and bioinformatics [12] to neuroscience [13] and ecology [14], ever growing datasets have driven the development of distributed workflow systems in science [15] [16].…”
Section: Related Workmentioning
confidence: 99%
“…Because of this, processing is increasingly moving to grid-and cloud-based distributed computing facilities. From genetic sequencing [11] and bioinformatics [12] to neuroscience [13] and ecology [14], ever growing datasets have driven the development of distributed workflow systems in science [15] [16].…”
Section: Related Workmentioning
confidence: 99%
“…This requires adding data-intensive analysis while preserving high-performance computing capabilities. Having established the potential of the Pilot-Abstraction for a range of high-performance applications [36], [37], [38], we use it as the starting point for integrated high-performance compute and data-intensive analysis. We propose several extensions to RADICAL-Pilot to facilitate the integrated use of HPC and Hadoop frameworks using the Pilot-Abstraction.…”
Section: Integrating Hadoop and Spark With Radical-pilotmentioning
confidence: 99%
“…On the other hand, the four distributed infrastructures, Amazon EC2, cyberinfrastructure for reconfigurable optical networks (CRON), global environment for network innovations (GENI), and CloudLab, are distinctly different. Amazon EC2 is a compute‐on‐demand infrastructure as a service cloud environment that is recently widely utilized for research and science gateways . CRON is our testbed system that can be configured to emulate a distributed computing infrastructure connected by configurable networking conditions.…”
Section: Hadoop‐based Replica Exchangementioning
confidence: 99%