2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud) 2016
DOI: 10.1109/datacloud.2016.004
|View full text |Cite
|
Sign up to set email alerts
|

Asterism: Pegasus and Dispel4py Hybrid Workflows for Data-Intensive Science

Abstract: We present Asterism, an open source data-intensive framework, which combines the strengths of traditional workflow management systems with new parallel stream-based dataflow systems to run data-intensive applications across multiple heterogeneous resources, without users having to: re-formulate their methods according to different enactment engines; manage the data distribution across systems; parallelize their methods; co-place and schedule their methods with computing resources; and store and transfer large/… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
2
1

Relationship

2
8

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 20 publications
0
5
0
1
Order By: Relevance
“…RECUP:A (META)DATA FRAMEWORK FOR REPRODUCING HYBRID WORKFLOWS WITH FAIR By: Line C. Pouchard, Tanzima Z. Islam, Bogdan Nicolae, and Robert Ross Successfully reproducing results in computational campaigns that include data-intensive applications and machine learning (ML) is challenging even when the same data input and initial scripts are reused [40], [41]. Large-scale ensemble campaigns where workflow management systems execute tightly coupled task and data processes-so-called hybrid workflows [42] present additional reproducibility challenges due to workflow complexity, scale, distributed execution of tasks, and heterogeneous architectures. The inability to reproduce predictions obtained from ML applications and hence results from hybrid workflows presents a significant obstacle, impairing scientists' ability to validate and trust results; additionally, the lack of reproducibility can inhibit the adoption of research outcomes by others [40].…”
Section: B Featuresmentioning
confidence: 99%
“…RECUP:A (META)DATA FRAMEWORK FOR REPRODUCING HYBRID WORKFLOWS WITH FAIR By: Line C. Pouchard, Tanzima Z. Islam, Bogdan Nicolae, and Robert Ross Successfully reproducing results in computational campaigns that include data-intensive applications and machine learning (ML) is challenging even when the same data input and initial scripts are reused [40], [41]. Large-scale ensemble campaigns where workflow management systems execute tightly coupled task and data processes-so-called hybrid workflows [42] present additional reproducibility challenges due to workflow complexity, scale, distributed execution of tasks, and heterogeneous architectures. The inability to reproduce predictions obtained from ML applications and hence results from hybrid workflows presents a significant obstacle, impairing scientists' ability to validate and trust results; additionally, the lack of reproducibility can inhibit the adoption of research outcomes by others [40].…”
Section: B Featuresmentioning
confidence: 99%
“…On the other hand, other solutions combine existing frameworks to support Hybrid Workflows. Asterism [28] is a hybrid framework combining dispel4py and Pegasus at different levels to run data-intensive stream-based applications across platforms on heterogeneous systems. The main idea is to represent the different parts of a complex application as dispel4py workflows which are, then, orchestrated by Pegasus as tasks.…”
Section: Hybrid Frameworkmentioning
confidence: 99%
“…These platforms have and will continue to evolve thereby increasing the importance of abstraction and automated mappings as a means of preserving the meaning of scientific methods. However, this can pose challenging set up requirements, to provide the initial enactment context or to rebuild an earlier enactment context for scientific reproducibility -some early experience tackling such issues has been reported [7,8]. Many of these data-intensive platforms also have their own languages for creating data-driven methods, that are intimately integrated and well presented.…”
Section: Minimised Recovery Costs After Partial Failuresmentioning
confidence: 99%