49th International Conference on Parallel Processing - ICPP 2020
DOI: 10.1145/3404397.3404418
|View full text |Cite
|
Sign up to set email alerts
|

Robustness of the Young/Daly formula for stochastic iterative applications

Abstract: The Young/Daly formula for periodic checkpointing is known to hold for a divisible load application where one can checkpoint at any time-step. In an nutshell, the optimal period is P YD = 2µ f C where µ f is the Mean Time Between Failures (MTBF) and C is the checkpoint time. This paper assesses the accuracy of the formula for applications decomposed into computational iterations where: (i) the duration of an iteration is stochastic, i.e., obeys a probability distribution law D of mean µ D ; and (ii) one can ch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…Alternative approach To understand the actual behavior of the system, some work [7] consider the bounded slowdown as a function of the size of the job. In this case, this objective is not one to optimize anymore, but more a qualitative way to measure and understand the performance of a solution.…”
Section: Mean (Bounded) Slowdownmentioning
confidence: 99%
See 1 more Smart Citation
“…Alternative approach To understand the actual behavior of the system, some work [7] consider the bounded slowdown as a function of the size of the job. In this case, this objective is not one to optimize anymore, but more a qualitative way to measure and understand the performance of a solution.…”
Section: Mean (Bounded) Slowdownmentioning
confidence: 99%
“…Note that when jobs fail to complete fully (for instance because their walltime is underestimated), it is interesting to measure the "useful utilization", i.e. the volume of computation that lead to a successful execution [7] . Limits for HPC workloads One of the main limitation concerns machines with lower submission rate (i.e.…”
Section: Utilizationmentioning
confidence: 99%
“…Recently, the results of [44] have been extended to deal with linear chains whose tasks do not have constant execution times but instead obey some probability distribution [13]. As pointed out above, for general workflows, deciding which tasks to checkpoint has been shown #P-complete [21], but the results of [3] show that if the graph is scheduled in a sequential manner (linearized), then one can derive an optimal checkpointing strategy.…”
Section: Checkpointingmentioning
confidence: 99%