2015
DOI: 10.1002/sec.1187
|View full text |Cite
|
Sign up to set email alerts
|

CDMCR: multi‐level fault‐tolerant system for distributed applications in cloud

Abstract: Cloud provides users with a new model of utilizing the computing infrastructure with the ability to perform parallel and distributed computations using elastic virtual cluster. However, the multi-level and complex features make cloud computing system more prone to failure. In this paper, we present a multi-level fault-tolerant system for distributed applications in cloud named Distributed-application oriented Multi-level Checkpoint/Restart for Cloud (CDMCR). The CDMCR system backups the complete state of appli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…However, focusing on checkpointing, Qiang et al . presented a multi‐level fault‐tolerant system for distributed applications in cloud named distributed application‐oriented multi‐level checkpoint/restart for cloud. The system backs up complete application states periodically as a snapshot‐based‐distributed checkpointing protocol, including file system state.…”
Section: Related Workmentioning
confidence: 99%
“…However, focusing on checkpointing, Qiang et al . presented a multi‐level fault‐tolerant system for distributed applications in cloud named distributed application‐oriented multi‐level checkpoint/restart for cloud. The system backs up complete application states periodically as a snapshot‐based‐distributed checkpointing protocol, including file system state.…”
Section: Related Workmentioning
confidence: 99%
“…By capturing the global state of resources periodically, artificial intelligence applications can resume the computation from the latest snapshot, not from the beginning, while guaranteeing the service level agreement (SLA). However, capturing the global state of resources in cloud computing environments is not a trivial task since individual resources are independent units and, therefore, the snapshot protocol can only be done by passing messages due to the lack of shared memory between the nodes in the system [10,11].…”
Section: Introductionmentioning
confidence: 99%