Proceedings IEEE International Conference on Cluster Computing CLUSTR-03 2003
DOI: 10.1109/clustr.2003.1253337
|View full text |Cite
|
Sign up to set email alerts
|

Availability prediction and modeling of high mobility OSCAR cluster

Abstract: Since the initial introduction of Open Source ClusterApplication Resources (OSCAR), this software package has been a well-accepted choice for building high performance computing systems. As it continues to be applied to mission-critical environments, high availability (HA) features therefore are needed to be included in OSCAR cluster. In this paper, we provide a HA solution for OSCAR cluster. As a widely used technique in HA solutions, component redundancy is adopted to improve the system availability. Based o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2004
2004
2013
2013

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 32 publications
(10 citation statements)
references
References 3 publications
0
10
0
Order By: Relevance
“…We could find only a few approaches that address availability, such as [56], [57] and [58]. The approach presented in [56] is mainly concerned with architectural design and not analysis.…”
Section: Comparison Of the Selected Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We could find only a few approaches that address availability, such as [56], [57] and [58]. The approach presented in [56] is mainly concerned with architectural design and not analysis.…”
Section: Comparison Of the Selected Methodsmentioning
confidence: 99%
“…The approach presented in [56] is mainly concerned with architectural design and not analysis. The Laprie and Kanoun model [57] also addresses the problem of modeling reliability and availability with respect to various classes of faults.…”
Section: Comparison Of the Selected Methodsmentioning
confidence: 99%
“…More attention has been paid to modeling the characteristics of resource availability and many researches [5,[8][9][10][21][22][23] show that strong temporal and spatial correlations of failure events and resource failures follow the Weibull, Hyperexponential and Pareto distributions with different parameters rather than a Poisson distribution. Oliner et al [4] and Zhang et al [5] evaluate three application-level periodic checkpoint heuristics, checkpointing all jobs, long jobs and big jobs, in large-scale cluster system using temporal or spatial information of resource availability.…”
Section: Related Workmentioning
confidence: 99%
“…However, there has been little prior work on estimating arrivals to supercomputing clusters (although see [23], [24]). Instead we will use a technique that has been developed for estimating a different notion of "workload", namely the number of ongoing jobs.…”
Section: B Benchmark Modelsmentioning
confidence: 99%