2008
DOI: 10.1007/s11227-008-0259-0
|View full text |Cite
|
Sign up to set email alerts
|

A fault-tolerant strategy for virtualized HPC clusters

Abstract: Virtualization is a common strategy for improving the utilization of existing computing resources, particularly within data centers. However, its use for high performance computing (HPC) applications is currently limited despite its potential for both improving resource utilization as well as providing resource guarantees to its users. In this article, we systematically evaluate three major virtual machine implementations for computationally intensive HPC applications using various standard benchmarks. Using V… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3

Citation Types

0
3
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 15 publications
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…However, in general, MPI is rarely selected for developing real-time data processing systems because it does not provide standardized fault tolerance interfaces and semantics. Although extensive research (Rodrguez et al 2007, p. 153;Walters & Chaudhary 2009;Hursey et al 2011) has been conducted in this area, few available tools exist to help parallel programmers enhance their applications with fault tolerance support. Moreover, the exploitation of MPI is impeded by difficulties in software development.…”
Section: Introductionmentioning
confidence: 99%
“…However, in general, MPI is rarely selected for developing real-time data processing systems because it does not provide standardized fault tolerance interfaces and semantics. Although extensive research (Rodrguez et al 2007, p. 153;Walters & Chaudhary 2009;Hursey et al 2011) has been conducted in this area, few available tools exist to help parallel programmers enhance their applications with fault tolerance support. Moreover, the exploitation of MPI is impeded by difficulties in software development.…”
Section: Introductionmentioning
confidence: 99%
“…The resiliency of HPC platforms in the face of component failure has been a topic of significant work [4][5][6][7][8][9][10]. Although an abundance of checkpointing research has been performed on virtualized computing platforms [6,[8][9][10][11], these methods have not addressed heterogeneous systems including both virtual machines and externally hosted physical storage volumes.…”
Section: Introductionmentioning
confidence: 99%
“…Although an abundance of checkpointing research has been performed on virtualized computing platforms [6,[8][9][10][11], these methods have not addressed heterogeneous systems including both virtual machines and externally hosted physical storage volumes. In this paper, we offer an approach to resiliency that is applicable to these aggregated complex resources.…”
Section: Introductionmentioning
confidence: 99%