Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

Deelman,; Callaghan,; Francoeur,; Graves,; Gupta, Puneet; Jordan, Vanessa; Kesselman, Carl; Maechling, P. J.; Mehringer,; Mehta,; Okaya,; Vahi,; Zhao, Fuqiang

doi:10.1109/e-science.2006.261098

Cited by 65 publications

(58 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gravitational Wave Observatory. CyberShake [92] is used by the Southern California Earthquake Center to characterize earth quake hazards in a region. The workflow jobs are in the Directed Acyclic Graph in XML (DAX) format and generated the using the workflow generator from [25].…”

Section: Job Schedulingmentioning

confidence: 99%

Cloud Auto-Scaling with Deadline and Budget Constraints

Mao

View full text Add to dashboard Cite

The cloud has become an important computing platform. It has attracted many businesses and individual users by offering on-demand computing power and storage capacity. The economies of scale and pay-as-you-go billing model could save users large up-front capital investments and long term operation costs. A key feature of the cloud is the elasticity, the ability to dynamically acquire and release computing resources in response to demand. We believe the key to successful cloud adoption is to first decide how much and what type of resources is needed in the cloud ("provisioning") and then decide how to place computing activities onto each of the resources ("allocation"). This is a challenging problem because the mapping from user objectives to the resource provisioning and allocation plans is not trivial. It needs to carefully consider the following factors. A performance goal can be achieved through different types of resources with different costs. A fixed budget can be used to rent a wide variety of resource configurations for varying durations. The structure of a cloud application could be complex. Task precedence orders need to be preserved in a job. The workload may experience unexpected peaks. The performance requirements and cost constraints may be changed dynamically.This dissertation solves this resource provisioning and allocation problem using an auto-scaling approach.It solves the batch-queue application model based on the integer programming technique. By ensuring the computing power is always large enough to handle the workload for all the VM types, in our experiment, our approach finishes more than 95% jobs before the deadline and saves 20.2% -40.1% cost compared to a fixed machine type choice. Our approach contains several innovative heuristics for the workflow application model. In the unlimited budget case, the presented solution -dynamic scaling-consolidation-scheduling (SCS) -can save 9.4% -40.4% cost compared to two baseline approaches and can work well in both light and heavy workload environments. In the limited budget case, our scheduling-first and scaling-first algorithms can reduce 9.8% -45.2% job turnaround time than the standard machine choice and they also show good tolerance (between -10.2% and 16.7%) to inaccurate parameters (±20% error). Finally, this dissertation presents three job scheduling policies and a data prefetching strategy to manage the intermediate data for data-intensive applications running in the cloud. Particularly, the cost-deadline-first (CDF) algorithm can i Abstract ii save 13.5% -33.7% cost compared to the deadline-first (DF) algorithm and the data prefetching strategy can further improve the cost saving up to 44.6% through data locality aware job placement. Approval SheetThis dissertation is submitted in partial fulfillment of the requirements for the degree of I am so lucky to have so many friends to support me all the time. They share both the joys and tears with me. They help me out of troubles, bring me happiness and inspire me with great ideas. Th...

show abstract

Section: Job Schedulingmentioning

confidence: 99%

Cloud Auto-Scaling with Deadline and Budget Constraints

Mao

View full text Add to dashboard Cite

show abstract

“…On the other hand, the focus of this paper is effectively retrieving information from hierarchical workflows and presenting it to the user according to specified keywords. There is also research on how to effectively execute a specific type of workflow in various circumstances [39,24,18,23]. These are orthogonal problems of defining query results on workflow hierarchies, as individual execution methods of a workflow hierarchy do not affect the specification of views on the workflow hierarchy, hence do not affect the definition of query results and the algorithms to generate query results.…”

Section: Queries For Evaluationmentioning

confidence: 99%

Searching workflows with hierarchical views

2010

View full text Add to dashboard Cite

Workflows are prevalent in diverse applications, which can be scientific experiments, business processes, web services, or recipes. With the dramatically growing number of workflows, there is an increasing need for people to search a workflow repository using keywords and to retrieve the relevant ones. A workflow hierarchy is a three dimensional object containing multiple abstraction views of different granularity on the same workflow. This unique structure poses a new set of challenges compared to keyword search on tree or graph structures typically found in relational or XML data.In this paper, we define an informative, self-contained and concise search result on workflows to be a projection of a workflow hierarchy on a two dimensional viewing plane inferred from user queries. We then design and develop an efficient keyword search engine for workflows. Experimental evaluation demonstrates the effectiveness of our approach.

show abstract

“…In such cases we can use Wings and Pegasus to iteratively instantiate and map the workflow. Figure 5 shows the interaction between Wings and Pegasus when instantiating a workflow in an earthquake science application CyberShake [9]. Initially the first portion of the workflow is instantiated by Wings and sent to Pegasus for mapping.…”

Section: Figure 4: a Schematic Of A Portal For Workflowbased Applicatmentioning

confidence: 99%

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

Deelman¹,

Gil²

2006

2006 Second IEEE International Conference on E-Science and Grid Computing (E-Science'06)

Self Cite

View full text Add to dashboard Cite

In this paper we discuss several challenges associated scientific workflow design and management in distributed, heterogeneous environments. Based on our prior work with a number of scientific applications, we describe the workflow lifecycle and examine our experiences and the challenges ahead as they pertain to the user experience, planning the workflow execution and managing the execution itself.

show abstract

Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example

Cited by 65 publications

References 22 publications

Cloud Auto-Scaling with Deadline and Budget Constraints

Cloud Auto-Scaling with Deadline and Budget Constraints

Searching workflows with hierarchical views

Managing Large-Scale Scientific Workflows in Distributed Environments: Experiences and Challenges

Contact Info

Product

Resources

About