2013
DOI: 10.1016/j.jpdc.2013.01.013
|View full text |Cite
|
Sign up to set email alerts
|

BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds

Abstract: Infrastructure-as-a-Service (IaaS) cloud computing is gaining significant interest in industry and academia as an alternative platform for running HPC applications. Given the need to provide fault tolerance, support for suspend-resume and offline migration, an efficient Checkpoint-Restart mechanism becomes paramount in this context. We propose BlobCR, a dedicated checkpoint repository that is able to take live incremental snapshots of the whole disk attached to the virtual machine (VM) instances. BlobCR aims t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 33 publications
(21 citation statements)
references
References 35 publications
0
21
0
Order By: Relevance
“…However, we studied the ATCCp behavior compared to coordinated checkpointing CCp used in [1], [16], [35] and independent checkpointing ICp used in [15], [28], [36]. The goal is proofing that our approach ensures a strong consistency with the minimum cost (like the CCp) and with the minimum overhead (like the ICp).…”
Section: Analysis and Comparative Studymentioning
confidence: 99%
See 1 more Smart Citation
“…However, we studied the ATCCp behavior compared to coordinated checkpointing CCp used in [1], [16], [35] and independent checkpointing ICp used in [15], [28], [36]. The goal is proofing that our approach ensures a strong consistency with the minimum cost (like the CCp) and with the minimum overhead (like the ICp).…”
Section: Analysis and Comparative Studymentioning
confidence: 99%
“…The cloud provisions pools of computing resources as services via the internet using a pay-as-you-go price model that eliminates initial costly capital investments in hardware and infrastructure. Research and academic communities can leverage the benefit of the cloud price model for their computationintensive applications that traditionally run in HPC environments [34], [35], [36], [37] such as Amazon Web Services' HPC offering [38] or science cloud initiatives [39].…”
mentioning
confidence: 99%
“…For example, a majority of high-performance computing (HPC) numerical simulations model the evolution of physical phenomena in time by using a bulk-synchronous approach. This involves a synchronization point at the end of each iteration in order to write intermediate output data about the simulation, as well as periodic checkpoints that are needed for a variety of tasks [5] such as migration, debugging, and minimizing the amount of lost computation in case of failures. Since many processes share the same storage (e.g., all processes on the same node share the same local disks), this behavior translates to periods of little I/O activity that are interleaved with periods of highly intensive I/O peaks.…”
Section: Introductionmentioning
confidence: 99%
“…In [24], authors propose BlobCR, a checkpoint framework for High Performance Computing (HPC) applications on IaaS. Their approach is directed at both application and process checkpoint levels through a distributed checkpoint repository.…”
Section: Data Replicationmentioning
confidence: 99%