2014
DOI: 10.3233/jhs-140492
|View full text |Cite
|
Sign up to set email alerts
|

Lightweight coordinated checkpointing in cloud computing

Abstract: Fault tolerance is actually an essential issue in cloud computing to face failures and minimize their damages. The checkpointing is a powerful fault tolerance technique that consists of saving the transient state of a computation system on a persistent storage from which the execution state can be restarted in case of failure. The coordinated checkpointing is an efficient checkpointing strategy because it is domino effect-free and it needs only the last stored checkpoint to ensure a consistent state. In this p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 21 publications
0
4
0
Order By: Relevance
“…AccordingtoKuang(2014),checkpointingandrollbackrecoveryschemesarethefamousbackward faulttoleranttechniquestominimizetheexecutiontimeofthelong-runningapplications,suchas scientificcomputingandtelecommunicationapplications.Thesystemtakescheckpointsaccordingto somespecifiedpolicy,andrecoversautomaticallyfromthetransientfaultiftheyoccur.Specifically, thetimebetweentwosuccessivecheckpointsisreferredtoasthecheckpointinterval.Accordingto Meroufel(2014),asavedstateoftheprocessiscalledacheckpoint,toreducethenumberoflogs tobereplayedduringtherollbackrecovery.Duringfailure-freeexecution,thetimebetweentwo consecutive checkpoints is referred to as the checkpoint interval according to Islam (2014). The checkpoint interval is one of the major factors influencing the performance of the fault tolerant schemeaccordingtoMendizabal(2014)andAwasthi (2014).Asthecheckpointintervaldecreases,in thepresenceofthefailureevent,thecomputationlossdecreases.However,excessivecheckpointing operationsincurhighoverheadduringthenormalfailure-freeexecutionandmayresultinsevere performancedegradation.Onthecontrary,asthecheckpointintervalincreases,theoverheadforthe checkpointingoperationduringthefailure-freeexecutiondecreases.However,thecomputationloss causedbythefailureeventincreasesanddeficientcheckpointingmayincuranexpensiverollback recoveryoverhead.Therefore,atrade-offmustbemadetodetermineapropercheckpointinterval forhighfaulttolerantperformanceaccordingtoElnozahy (2002( )andTreaster(2005.…”
Section: Introductionmentioning
confidence: 99%
“…AccordingtoKuang(2014),checkpointingandrollbackrecoveryschemesarethefamousbackward faulttoleranttechniquestominimizetheexecutiontimeofthelong-runningapplications,suchas scientificcomputingandtelecommunicationapplications.Thesystemtakescheckpointsaccordingto somespecifiedpolicy,andrecoversautomaticallyfromthetransientfaultiftheyoccur.Specifically, thetimebetweentwosuccessivecheckpointsisreferredtoasthecheckpointinterval.Accordingto Meroufel(2014),asavedstateoftheprocessiscalledacheckpoint,toreducethenumberoflogs tobereplayedduringtherollbackrecovery.Duringfailure-freeexecution,thetimebetweentwo consecutive checkpoints is referred to as the checkpoint interval according to Islam (2014). The checkpoint interval is one of the major factors influencing the performance of the fault tolerant schemeaccordingtoMendizabal(2014)andAwasthi (2014).Asthecheckpointintervaldecreases,in thepresenceofthefailureevent,thecomputationlossdecreases.However,excessivecheckpointing operationsincurhighoverheadduringthenormalfailure-freeexecutionandmayresultinsevere performancedegradation.Onthecontrary,asthecheckpointintervalincreases,theoverheadforthe checkpointingoperationduringthefailure-freeexecutiondecreases.However,thecomputationloss causedbythefailureeventincreasesanddeficientcheckpointingmayincuranexpensiverollback recoveryoverhead.Therefore,atrade-offmustbemadetodetermineapropercheckpointinterval forhighfaulttolerantperformanceaccordingtoElnozahy (2002( )andTreaster(2005.…”
Section: Introductionmentioning
confidence: 99%
“…In Checkpointing and rollback-recovery schemes, each of the replicated state of the process is called a checkpoint [1][2]. Upon a fault, there is a recovery mechanism which brings the failure process to the normal execution [3][4][5].…”
Section: Introductionmentioning
confidence: 99%
“…For the fault tolerance of mobile computing, checkpointing and rollback recovery are well-known backward error recovery techniques to minimize loss of computation in the presence of process faults (Kuang et al, 2014;Meroufel et al, 2014). Basically, the transparent fault tolerant schemes that do not require user interaction can be classified into two categories: checkpoint-based and log-based rollback recovery scheme (Islam et al, 2014;Mendizabal et al, 2014;Awasthi et al, 2014).…”
Section: Introductionmentioning
confidence: 99%