-In the scientist's community one of the most vital challenges is the issue of reproducibility of workflow execution. In order to reproduce the results of an experiment, on one hand provenance information must be collected and on the other hand the dependencies of the execution need to be eliminated. Concerning the workflow execution environment we have differentiated four levels of provenance: infrastructural, environmental, workflow and data provenance. During the re-execution at all levels the components can change and capturing the data of each levels targets different problems to solve. For example storing the environmental and infrastructural parameters enables the portability of workflows between the different parallel and distributed systems (grid, HPC, cloud). The describers of the workflow model enable tracking the different versions of the workflow and their impacts on the execution. Our goal is to capture the most optimal parameters in number and type as well and reconstruct the way of data production independently from the environment. In this paper we investigate the necessary and satisfactory parameters of workflow reproducibility and give a mathematical formula to determine the rate of reproducibility. These measurements allow the scientist to make a decision about the next steps toward the creation of reproducible workflows.
Scientific workflow systems aim to provide user friendly, end-to-end solutions for automating and simplifying computational or data intensive tasks. A number of workflow environments have been developed in recent years to provide support for the specification and execution of scientific workflows. Normal static workflows can poorly cope with the ever changing status of the existing distributed systems. During workflow enactment unforeseen scenarios may arise, which can cause significant delays, failed executions, or improper results. Manual workflow enactment has its obvious limitations, however automatic failover mechanisms require minimum an accurate information set about the workflow tasks and about the status of the underlying processing infrastructure. Dynamism can be defined at different abstraction levels and in different phases of the workflow lifecycle. In this paper we identify the requirements of dynamic workflows in general and provide a thorough survey about gUSE/WS-PGRADE's dynamic workflow handling capabilities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.