2008
DOI: 10.1109/ms.2008.92
|View full text |Cite
|
Sign up to set email alerts
|

Scientific Software as Workflows: From Discovery to Distribution

Abstract: Scientific workflows -models of computation that capture the orchestration of scientific codes to conduct "in silico" research -are gaining recognition as an attractive alternative to script-based orchestration. Despite growing interest, there are a number of fundamental challenges that researchers developing scientific workflow technologies must address, including developing the underlying "science" of scientific workflows. In this article, we present a broad classification of scientific workflow environments… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

3
19
0
2

Year Published

2009
2009
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 33 publications
(24 citation statements)
references
References 11 publications
3
19
0
2
Order By: Relevance
“…The current state of the art is to tell a scientist to rewrite her algorithm in Map Reduce in order to make it faster, or to integrate it into a data system -this takes away from the scientific stewardship of the algorithm and transfers it to the software engineering team, who may lack the necessary background and training to maintain that algorithm, and furthermore, largely computer scientists are not trained in scientific programming environments like Matlab, R, Python, IDL, etc. Scientific Workflow Systems can help here [25], and also current efforts for DARPA XDATA, NASA's RCMES project, and for NSF EarthCube will provide an evaluation environment for future work in this area. Intelligent data movement At a recent Hadoop Summit meeting, I recall the VP of Amazon Web Services explaining to an audience member what the best way to send 10+ terabytes of data to Amazon would be in order to process it on EC2.…”
Section: Looking Towards the Futurementioning
confidence: 99%
“…The current state of the art is to tell a scientist to rewrite her algorithm in Map Reduce in order to make it faster, or to integrate it into a data system -this takes away from the scientific stewardship of the algorithm and transfers it to the software engineering team, who may lack the necessary background and training to maintain that algorithm, and furthermore, largely computer scientists are not trained in scientific programming environments like Matlab, R, Python, IDL, etc. Scientific Workflow Systems can help here [25], and also current efforts for DARPA XDATA, NASA's RCMES project, and for NSF EarthCube will provide an evaluation environment for future work in this area. Intelligent data movement At a recent Hadoop Summit meeting, I recall the VP of Amazon Web Services explaining to an audience member what the best way to send 10+ terabytes of data to Amazon would be in order to process it on EC2.…”
Section: Looking Towards the Futurementioning
confidence: 99%
“…End users with no expertise in big data analytics techniques face many challenges in analyzing their data. Workflow systems make big data analytics more accessible by managing common tasks that big data producers and algorithm developers execute to transform information throughout its lifecycle from data production, to processing/transformation, and ultimately to data distribution [Woollard et al 2008]. Experts can create workflows that represent complex multi-step analytic tasks and share them so that end users can use them with their own data It is possible to design alternative workflows for the same task that have very different performance, particularly for big data.…”
Section: Introductionmentioning
confidence: 99%
“…Additionally, workflow techniques have been increasingly employed in scientific communities where, scientists make discoveries by conducting complex sets of scientific computations and data analyses. These scientific explorative activities in biology, chemistry engineering, geosciences, medicine and physics [6][7][8]11,13,14,22,37,43] are typically carried out at multiple sites, run over long periods of time and involve multidisciplinary processes and huge amounts of data. For example, a typical biological experimental scenario may consist of hundreds of computational steps; each step may be distributed over the Web by taking input data from various databases and disseminating the results obtained from one step to multiple downstream steps in a distributed environment.…”
Section: Introductionmentioning
confidence: 99%
“…The first challenge is the lack of sets of standard and commonly accepted descriptions for scientific workflows [19,39,43,13]. Currently, most of workflow descriptions are based either on pure control flow models or on pure data flow models.…”
Section: Introductionmentioning
confidence: 99%