2020
DOI: 10.1007/978-3-030-57675-2_26
|View full text |Cite
|
Sign up to set email alerts
|

Managing Failures in Task-Based Parallel Workflows in Distributed Computing Environments

Abstract: Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 13 publications
0
8
0
Order By: Relevance
“…We extended the PyCOMPSs syntax to enable the developer to indicate the desired behavior if an error occurred in a task. 11 For example, this interface allows you to tell the runtime to ignore individual tasks' errors and continue execution and cancel the execution of erroneous task successors.…”
Section: While the Biobb Development Team Has Conducted Its Research ...mentioning
confidence: 99%
See 1 more Smart Citation
“…We extended the PyCOMPSs syntax to enable the developer to indicate the desired behavior if an error occurred in a task. 11 For example, this interface allows you to tell the runtime to ignore individual tasks' errors and continue execution and cancel the execution of erroneous task successors.…”
Section: While the Biobb Development Team Has Conducted Its Research ...mentioning
confidence: 99%
“…PyCOMPSs has been enhanced with new features, which were fresh and exciting enough to imply innovative research contributions. 11 A significant plus for our team is that these new features were helpful for the BioBB workflows and have fostered the research and development of workflows for molecular dynamics. Finally, these activities impact the user community, with new workflows available for their research, and have been helpful for emerging research activities, such as the COVID-19 investigations.…”
Section: Impactmentioning
confidence: 99%
“…Functions failures, caused by various reasons, affect the entire FC [36]. Some functions may not even start if the input JSON file exceeds the size limit or there is an authentication error.…”
Section: Function Failures Affect the Entire Fcmentioning
confidence: 99%
“…With regard fault tolerance, a mechanism at task level is provided, where the programmer can indicate in a decorator the behavior to implement in case of task failure (i.e., ignore the failure of the task and continue, stop the whole workflow, etc.) [17]. In addition, a checkpointing mechanism at task level has been implemented, which enables to recover a failed execution from the last checkpointed task.…”
Section: A Compssmentioning
confidence: 99%
“…However, many scientific workflows use very big files as inputs, and to avoid packaging them in the RO-Crate and having big data movements between environments, we will add them as URIs, so users know where to find them to reproduce an execution. This approach is commonly used in the life sciences domain, where online catalogs of commonly used files (e.g., the EGA Archive 17 ) are provided, and applications that process them only need to refer to the URI where they can be found for downloading.…”
Section: A Registered Data Assets Using Compssmentioning
confidence: 99%