A framework for scientific workflow reproducibility in the cloud

Qasha, Rawaa Putros; Cała, Jacek; Watson, Paul

doi:10.1109/escience.2016.7870888

Cited by 29 publications

(17 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such platforms include Nextflow [17], Bwb [18] and Pachyderm [19]. While containerization addresses many of the issues outlined above and has facilitated the execution of generic workflows in a languageand cloud-agnostic manner [20,21], IaaS services still require users to deploy and manage clusters. Resource management tools like Docker Swarm and Kubernetes are mature and widely used technologies that help manage container orchestration and even support auto-scaling of resources, but they still require an installation and configuration process that may be cumbersome, or in the case of managed solutions, expensive.…”

Section: Background and Related Workmentioning

confidence: 99%

Sweep

John

Ausmees

Muenzen

et al. 2019

Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion

View full text Add to dashboard Cite

Scientific and commercial applications are increasingly being executed in the cloud, but the difficulties associated with cluster management render on-demand resources inaccessible or inefficient to many users. Recently, the serverless execution model, in which the provisioning of resources is abstracted from the user, has gained prominence as an alternative to traditional cyberinfrastructure solutions. With its inherent elasticity, the serverless paradigm constitutes a promising computational model for scientific workflows, allowing domain specialists to develop and deploy workflows that are subject to varying workloads and intermittent usage without the overhead of infrastructure maintenance. We present the Serverless Workflow Enablement and Execution Platform (SWEEP), a cloud-agnostic workflow management system with a purely serverless execution model that allows users to define, run and monitor generic cloud-native workflows. We demonstrate the use of SWEEP on workflows from two disparate scientific domains and present an evaluation of performance and scaling.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Sweep

John

Ausmees

Muenzen

et al. 2019

Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Companion

View full text Add to dashboard Cite

show abstract

“…TOSCA [15] is an OASIS standard to describe the topology of cloudbased applications towards portable, reproducible application deployments. Qasha et al [21] combine two execution-environment reproducibility techniques (i.e., the logical and physical preservation) of scientific workflows using TOSCA in a container-based approach. In addition to the plain reproducibility concerns, our middleware architecture employs reflection concepts to reconfigure deployment plans, resulting in efficient execution environments.…”

Section: Execution Environment Reproducibilitymentioning

confidence: 99%

Infracomposer: Policy-driven adaptive and reflective middleware for the cloudification of simulation & optimization workflows

Beni

Lagaisse

Joosen

2019

Journal of Systems Architecture

View full text Add to dashboard Cite

The simulation and optimization of complex engineering designs in automotive or aerospace involves multiple mathematical tools, long-running workflows and resource-intensive computations on distributed infrastructures. Finding the optimal deployment in terms of task distribution, parallelization, collocation and resource assignment for each execution is a step-wise process involving both human input with domain-specific knowledge about the tools as well as the acquisition of new knowledge based on the actual execution history. In this paper, we present a policy-driven adaptive and reflective middleware that supports smart cloud-based deployment and execution of engineering workflows. This middleware supports deep inspection of the workflow task structure and execution, as well as of the very specific mathematical tools, their executions and used parameters. The reflective capabilities are based on multiple meta-models to reflect workflow structure, deployment, execution and resources. Adaptive deployment is driven by both human input as meta-data annotations as well as adaptation policies that reason over the actual execution history of the workflows. We validate and evaluate this middleware in real-life application cases and scenarios in the domain of aeronautics.

show abstract

“…The former method is used in RO-Manager [16], a tool that uses the RO-Bundle specification [6]. A more recent approach relies on user action to create the topology, relationship, and node specifications based on a standard [17] that are eventually translated to a container [18]. In this paper, we focus on automatically creating research objects using AV.…”

Section: Related Workmentioning

confidence: 99%

Utilizing Provenance in Reusable Research Objects

et al. 2018

View full text Add to dashboard Cite

Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms.The minimum use-case for sharing a computational experiment (in the form of a shared research object) involves repeating its original execution and verifying its results. To truly exploit its potential, however, it must support modified reuse. Therefore, the research object must be created and stored not as a simple aggregation of digital content, as previously advocated [2,6], but in a readily-computable form: as a reusable research object. We demonstrate the distinction in two ways.Consider a typical research paper with an analysis based on large amounts of code and data, and assume that the researcher authoring the paper has used the code and data to conduct a number of experiments that produce the paper's target figures and results. The example paper's digital artifacts relating to its experiments may be bundled together in a medium such as a file archive (.tar), compressed file format (.gz), virtual image, or container. A shared research object is free to use any of these mediums. A reusable research object, however, must use a virtual image or container, since it must produce a computational research object that, when downloaded and shared, will guarantee an instantly-executable unit of computation.Also consider the example paper's metadata, which, similar to the metadata in most papers, is interspersed throughout the project's written analysis, and throughout its code and data. The metadata can take many forms, including annotations, version information, and provenance. A shared research object's metadata ...

show abstract

A framework for scientific workflow reproducibility in the cloud

Cited by 29 publications

References 24 publications

Sweep

Sweep

Infracomposer: Policy-driven adaptive and reflective middleware for the cloudification of simulation & optimization workflows

Utilizing Provenance in Reusable Research Objects

Contact Info

Product

Resources

About