2020 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC) 2020
DOI: 10.1109/urgenthpc51945.2020.00007
|View full text |Cite
|
Sign up to set email alerts
|

A Bespoke Workflow Management System for Data-Driven Urgent HPC

Abstract: In this paper we present a workflow management system which permits the kinds of data-driven workflows required by urgent computing, namely where new data is integrated into the workflow as a disaster progresses in order refine the predictions as time goes on. This allows the workflow to adapt to new data at runtime, a capability that most workflow management systems do not possess. The workflow management system was developed for the EU-funded VESTEC project, which aims to fuse HPC with real-time data for sup… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 6 publications
0
5
0
Order By: Relevance
“…Often one is not aware of the full set of criteria that schedulers are using to determine when jobs will run, and there can be complicated inter‐job relationships at play too. Driven by our interest in urgent computing workloads, 3 we require the ability to quickly predict how long a job will queue for on a given HPC machine before running based because we require that urgent jobs start to run as soon as possible. This means that we require the predicted start time to be reported in minutes and seconds, and it is important that the accuracy of such predictions is within a few minutes.…”
Section: Background and Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Often one is not aware of the full set of criteria that schedulers are using to determine when jobs will run, and there can be complicated inter‐job relationships at play too. Driven by our interest in urgent computing workloads, 3 we require the ability to quickly predict how long a job will queue for on a given HPC machine before running based because we require that urgent jobs start to run as soon as possible. This means that we require the predicted start time to be reported in minutes and seconds, and it is important that the accuracy of such predictions is within a few minutes.…”
Section: Background and Related Workmentioning
confidence: 99%
“…The VESTEC marshalling and control system 2 has been developed as a generic solution for running urgent, interactive workloads on HPC machines. Integrating use‐cases ranging from wildfire fighting 3 to tracking mosquito‐borne diseases, 4 these represent highly dynamic workloads, often driven by the arrival of data from external sources or interactivity from the end‐user, with the requirement that such workloads must start to run as quickly as possible. Consequently being able to accurately estimate how long jobs will likely queue before they start to run on compute nodes, across several supercomputers, is critical in providing optimal workload placement.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…computers to mobile devices suited to real-time operations. High-performance computing capabilities allow for fire behavior estimations and derived fire danger indices, the use of stochastic approaches to assess risk and potential impacts at large scales, and permitting the kinds of data-driven workflows required by urgent computing [27].…”
Section: Fire Simulation Frameworkmentioning
confidence: 99%
“…The blue box in Figure 1 contains the marshalling and control functionality of the VESTEC system, which drive the execution of workloads across the HPC machine(s). Workflows are a fundamental aspect [7], which represent the different stages of progression through a disaster's lifetime. The stages comprising a workflow are triggered by some combination of external stimulus and/or preceding workflow stages.…”
Section: A Marshalling and Control In The Vestec Systemmentioning
confidence: 99%