Scientific Software as Workflows: From Discovery to Distribution

Woollard, David; Medvidović, Nenad; Gil, Yolanda; Mattmann, Chris A.

doi:10.1109/ms.2008.92

Cited by 33 publications

(24 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The current state of the art is to tell a scientist to rewrite her algorithm in Map Reduce in order to make it faster, or to integrate it into a data system -this takes away from the scientific stewardship of the algorithm and transfers it to the software engineering team, who may lack the necessary background and training to maintain that algorithm, and furthermore, largely computer scientists are not trained in scientific programming environments like Matlab, R, Python, IDL, etc. Scientific Workflow Systems can help here [25], and also current efforts for DARPA XDATA, NASA's RCMES project, and for NSF EarthCube will provide an evaluation environment for future work in this area. Intelligent data movement At a recent Hadoop Summit meeting, I recall the VP of Amazon Web Services explaining to an audience member what the best way to send 10+ terabytes of data to Amazon would be in order to process it on EC2.…”

Section: Looking Towards the Futurementioning

confidence: 99%

Cultivating a research agenda for data science

Mattmann

2014

Journal of Big Data

View full text Add to dashboard Cite

I describe a research agenda for data science based on a decade of research and operational work in data-intensive systems at NASA, the University of Southern California, and in the context of open source work at the Apache Software Foundation. My vision is predicated on understanding the architecture for grid computing; on flexible and automated approaches for selecting data movement technologies and on their use in data systems; on the recent emergence of cloud computing for processing and storage, and on the unobtrusive and automated integration of scientific algorithms into data systems. Advancements in each of these areas are a core need, and they will fundamentally improve our understanding of data science, and big data. This paper identifies and highlights my own personal experience and opinion growing into a data scientist.

show abstract

Section: Looking Towards the Futurementioning

confidence: 99%

Cultivating a research agenda for data science

Mattmann

2014

Journal of Big Data

View full text Add to dashboard Cite

show abstract

“…End users with no expertise in big data analytics techniques face many challenges in analyzing their data. Workflow systems make big data analytics more accessible by managing common tasks that big data producers and algorithm developers execute to transform information throughout its lifecycle from data production, to processing/transformation, and ultimately to data distribution [Woollard et al 2008]. Experts can create workflows that represent complex multi-step analytic tasks and share them so that end users can use them with their own data It is possible to design alternative workflows for the same task that have very different performance, particularly for big data.…”

Section: Introductionmentioning

confidence: 99%

Time-bound analytic tasks on large datasets through dynamic configuration of workflows

Gil

Ratnakar

Verma

et al. 2013

Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science

Self Cite

View full text Add to dashboard Cite

Domain experts are often untrained in big data technologies and this limits their ability to exploit the data they have available. Workflow systems hide the complexities of high-end computing and software engineering by offering pre-packaged analytic steps combined into multi-step methods commonly used by experts. A current limitation of workflow systems is that they do not take into account user deadlines: they run workflows selected by the user, but take their time to do so. This is impractical when large datasets are at stake, since users often prefer to see an answer faster even if it has lower precision or quality. In this paper, we present an extension to workflow systems that enables them to take into account user deadlines by automatically generating alternative workflow candidates and ranking them according to performance estimates. The system makes these estimates based on workflow performance models created from workflow executions, and uses semantic technologies to reason about workflow options. Possible workflow candidates are presented to the user in a compact manner, and are ranked according to their runtime estimates. We have implemented this approach in the WOOT system, which combines and extends capabilities from the WINGS semantic workflow system and the Apache OODT Object Oriented Data Technology and workflow execution system.

show abstract

“…Additionally, workflow techniques have been increasingly employed in scientific communities where, scientists make discoveries by conducting complex sets of scientific computations and data analyses. These scientific explorative activities in biology, chemistry engineering, geosciences, medicine and physics [6][7][8]11,13,14,22,37,43] are typically carried out at multiple sites, run over long periods of time and involve multidisciplinary processes and huge amounts of data. For example, a typical biological experimental scenario may consist of hundreds of computational steps; each step may be distributed over the Web by taking input data from various databases and disseminating the results obtained from one step to multiple downstream steps in a distributed environment.…”

Section: Introductionmentioning

confidence: 99%

“…The first challenge is the lack of sets of standard and commonly accepted descriptions for scientific workflows [19,39,43,13]. Currently, most of workflow descriptions are based either on pure control flow models or on pure data flow models.…”

Section: Introductionmentioning

confidence: 99%

Efficiently supporting secure and reliable collaboration in scientific workflows

Cao

Zhang

2010

Journal of Computer and System Sciences

View full text Add to dashboard Cite

Recently, workflow technologies have been increasingly used in scientific communities. Scientists carry out research by employing scientific workflows to automate computing steps, analyze large data sets and integrate distributed computing processes. This is a challenging task because of insecure procedures in a distributed environment. In this paper, we present an access control framework and models for supporting secure and reliable collaboration. The proposed approaches combine control flows and data flow models to describe scientific workflows, and extend the atomicity sphere concept by considering two levels of atomicity abstraction at the level of process as well as at the level of data, in order to maintain the process consistency and the data consistency in the presence of failures. We also present a case study in a scientific research scenario to show the effectiveness of our approaches.

show abstract

Scientific Software as Workflows: From Discovery to Distribution

Cited by 33 publications

References 11 publications

Cultivating a research agenda for data science

Cultivating a research agenda for data science

Time-bound analytic tasks on large datasets through dynamic configuration of workflows

Efficiently supporting secure and reliable collaboration in scientific workflows

Contact Info

Product

Resources

About