Workflow is a well-established means by which to capture scientific methods in an abstract graph of interrelated processing tasks. The reproducibility of scientific workflows is therefore fundamental to reproducible e-Science. However, the ability to record all the required details so as to make a workflow fully reproducible is a long-standing problem that is very difficult to solve. In this paper, we introduce an approach that integrates system description, source control, container management and automatic deployment techniques to facilitate workflow reproducibility. We have developed a framework that leverages this integration to support workflow execution, re-execution and reproducibility in the cloud and in a personal computing environment. We demonstrate the effectiveness of our approach by examining various aspects of repeatability and reproducibility on real scientific workflows. The framework allows workflow and task images to be captured automatically, which improves not only repeatability but also runtime performance. It also gives workflows portability across different cloud environments. Finally, the framework can also track changes in the development of tasks and workflows to protect them from unintentional failures.
Several real-world problems in domain of healthcare, large scale scientific simulations, and manufacturing are organised as workflow applications. Efficiently managing workflow applications on the Cloud computing data-centres is challenging due to the following problems: (i) they need to perform computation over sensitive data (e.g. Healthcare workflows) hence leading to additional security and legal risks especially considering public cloud environments and (ii) the dynamism of the cloud environment can lead to several run-time problems such as data loss and abnormal termination of workflow task due to failures of computing, storage, and network services. To tackle above challenges, this paper proposes a novel workflow management framework call DoFCF (Deploy on Federated Cloud Framework) that can dynamically partition scientific workflows across federated cloud (public/private) data-centres for minimising the financial cost, adhering to security requirements, while gracefully handling run-time failures. The framework is validated in cloud simulation tool (CloudSim) as well as in a realistic workflow-based cloud platform (e-Science Central). The results showed that our approach is practical and is successful in meeting users security requirements and reduces overall cost, and dynamically adapts to the run-time failures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.