Swift: A language for distributed parallel scripting

Wilde, Michael; Hategan, Mihael; Wozniak, Justin M.; Clifford, Ben; Katz, Daniel S.; Foster, Ian

doi:10.1016/j.parco.2011.05.005

Cited by 349 publications

(236 citation statements)

References 38 publications

Supporting

Mentioning

233

Contrasting

Unclassified

Order By: Relevance

“…3 In each scenario, above some CCR value, which depends on the failure rate and the workflow size, CkptSome leads to significant improvement over CkptAll. As the CCR decreases, the relative expected makespan of CkptAll decreases and converges to 1.…”

Section: Expected Makespanmentioning

confidence: 99%

See 1 more Smart Citation

Checkpointing Workflows for Fail-Stop Errors

Han¹,

Canon²,

Casanova³

et al. 2017

2017 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

Abstract:We consider the problem of orchestrating the execution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the available processors and of a decision of which application data to checkpoint to stable storage, so as to mitigate the impact of processor failures. For general DAGs this problem is hopelessly intractable. In fact, given a solution, computing its expected makespan is still a difficult problem. To address this challenge, we consider a restricted class of graphs, Minimal Series-Parallel Graphs (M-SPGs). It turns out that many real-world workflow applications are naturally structured as M-SPGs. For this class of graphs, we propose a recursive list-scheduling algorithm that exploits the M-SPG structure to assign sub-graphs to individual processors, and uses dynamic programming to decide which tasks in these sub-gaphs should be checkpointed. Furthermore, it is possible to efficiently compute the expected makespan for the solution produced by this algorithm, using a first-order approximation of task weights and existing evaluation algorithms for 2-state probabilistic DAGs. We assess the performance of our algorithm for production workflow configurations, comparing it to (i) an approach in which all application data is checkpointed, which corresponds to the standard way in which most production workflows are executed today; and (ii) an approach in which no application data is checkpointed. Our results demonstrate that our algorithm strikes a good compromise between these two approaches, leading to lower checkpointing overhead than the former and to better resilience to failure than the latter. To the best of our knowledge, this is the first scheduling/checkpointing algorithm for workflow applications with fail-stop failures that considers workflow structures more general than mere linear chains of tasks.Key-words: workflow, checkpoint, fail-stop error, resilience. Stratégies de checkpoint pour les workflows en présence d'erreurs fatalesRésumé : Ce rapport considère l'ordonnancement de workflows (applications structurées en forme de graphes de tâches acycliques, ou DAGs) sur des plates-formes parallèlesà grandé echelle, soumisesà des erreurs fatales. L'objectif est de minimiser l'espérance du temps total d'exécution, ou makespan. Une solutionà ce problème comprend l'allocation ordonnée des tâches aux processeurs, et les décisions de checkpoint: quelles tâches sont suivies d'un checkpoint? Même pour une solution donnée, le calcul du makespan reste difficile. Nous nous restreignonsà une classe de DAGs particuliers, les graphes séries-parallèles minimaux, ou MSPGs. De nombreux workflows issus des applications ont pour graphe un M-SPG. Pour de tels graphes, nous proposons un algorithme qui utilise la structure récursive du M-SPG pour allouer des sous-graphesà chaque pro...

show abstract

Section: Expected Makespanmentioning

confidence: 99%

“…For instance, in production Workflow Management Systems (WMSs) [1,2,3,4,5,6], the default behavior is that all output data is saved to files and all input data is read from files, which is exactly the CkptAll strategy. While this strategy leads to fast restarts in case of failures, its downside is that it maximizes checkpointing overhead.…”

Section: Introductionmentioning

confidence: 99%

Checkpointing Workflows for Fail-Stop Errors

Han¹,

Canon²,

Casanova³

et al. 2017

2017 IEEE International Conference on Cluster Computing (CLUSTER)

View full text Add to dashboard Cite

show abstract

“…• Swift [62] is a dataflow language for scientific computing designed to enable easy composition of independent software tools and procedures into large-scale, throughputoriented parallel workflows that can be executed in cluster, cloud, and grid environments. Concurrency can be achieved with Swift/T, the MPI version of Swift.…”

Section: Classification Of Workflow Management Systemsmentioning

confidence: 99%

A characterization of workflow management systems for extreme-scale applications

Silva

Filgueira

Pietri

et al. 2017

Future Generation Computer Systems

125

View full text Add to dashboard Cite

Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow management systems relieve scientists from the details of an application and manage its execution on a computational infrastructure. As the resource requirements of today's computational and data science applications that process vast amounts of data keep increasing, there is a compelling case for a new generation of advances in high-performance computing, commonly termed as extreme-scale computing, which will bring forth multiple challenges for the design of workflow applications and management systems. This paper presents a novel characterization of workflow management systems using features commonly associated with extreme-scale computing applications. We classify 15 popular workflow management systems in terms of workflow execution models, heterogeneous computing environments, and data access methods. The paper also surveys workflow applications and identifies gaps for future research on the road to extreme-scale workflows and management systems.

show abstract

“…In this approach, the job load is adjusted automatically without running time prediction. Wilde et al proposed Swift, a scripting language for distributed computing [9]. Swift focuses on the concurrent execution, composition, and coordination of large scale independent computational tasks.…”

Section: Literature Surveymentioning

confidence: 99%

DCHEFT Approach for Task Scheduling to Efficient Resource Allocation in Cloud Computing

Gawali¹,

Shinde²

2017

IJEACS

View full text Add to dashboard Cite

Abstract-Task scheduling is an important aspect to improve the utilization of resources in the Cloud Computing. This paper proposes a Divide and Conquer based approach for heterogeneous earliest finish time algorithm. The proposed system works in two phases. In the first phase it assigns the ranks to the incoming tasks with respect to size of it. In the second phase, we properly assign and manage the task to the virtual machine with the consideration of ideal time of respective virtual machine. This helps to get more effective resource utilization in Cloud Computing. The experimental results using Cybershake Scientific Workflow shows that the proposed Divide and Conquer HEFT performs better than HEFT in terms of task's finish time and response time. The result obtained by experimentally demonstrate that the proposed DCHEFT performance superiorly.

show abstract

Swift: A language for distributed parallel scripting

Cited by 349 publications

References 38 publications

Checkpointing Workflows for Fail-Stop Errors

Checkpointing Workflows for Fail-Stop Errors

A characterization of workflow management systems for extreme-scale applications

DCHEFT Approach for Task Scheduling to Efficient Resource Allocation in Cloud Computing

Contact Info

Product

Resources

About