2010
DOI: 10.1142/s0129054110007283
|View full text |Cite
|
Sign up to set email alerts
|

Incremental Checkpoint Schemes for Weibull Failure Distribution

Abstract: Incremental checkpoint mechanism was introduced to reduce high checkpoint overhead of regular (full) checkpointing, especially in high-performance computing systems. To gain an extra advantage from the incremental checkpoint technique, we propose an optimal checkpoint frequency function that globally minimizes the expected wasted time of the incremental checkpoint mechanism. Also, the re-computing time coefficient used to approximate the re-computing time is derived. Moreover, to reduce the complexity in the r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
26
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 17 publications
(26 citation statements)
references
References 4 publications
0
26
0
Order By: Relevance
“…Here, time between failures is the Weibull distribution. The failure rate increase or decrease with time, it may not [11], [12]. Several studies analyze the time [8], [9], [12], [13], [14], [15].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Here, time between failures is the Weibull distribution. The failure rate increase or decrease with time, it may not [11], [12]. Several studies analyze the time [8], [9], [12], [13], [14], [15].…”
Section: Related Workmentioning
confidence: 99%
“…We make use of the terms node, processor and resource interchangeably. The time between failures of nodes is taken as to follow the Weibull distribution [8], [9], [10], [11].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the models in [9][10][11][12][13][14] assume that no failure event occurs during the rollback recovery phase, which is not a considerate representation for the characteristic of the rollback recovery execution. Besides, [15][16][17][18][19] also intended to determine the optimal checkpoint sequence under a certain circumstance in terms of the failure distribution.…”
Section: Introductionmentioning
confidence: 99%