Availability prediction and modeling of high mobility OSCAR cluster

Leangsuksun,; Shen,; Liu, Tong; Song, Hertong; Scott, Alexander P.

doi:10.1109/clustr.2003.1253337

Cited by 32 publications

(10 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We could find only a few approaches that address availability, such as [56], [57] and [58]. The approach presented in [56] is mainly concerned with architectural design and not analysis.…”

Section: Comparison Of the Selected Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Survey of reliability and availability prediction methods from the viewpoint of software architecture

2007

View full text Add to dashboard Cite

Many future software systems will be distributed across a network, extensively providing different kinds of services for their users. These systems must be highly reliable and provide services when required. Reliability and availability must be engineered into software from the onset of its development, and potential problems must be detected in the early stages, when it is easier and less expensive to implement modifications. The software architecture design phase is the first stage of software development in which it is possible to evaluate how well the quality requirements are being met. For this reason, a method is needed for analyzing software architecture with respect to reliability and availability. In this paper, we define a framework for comparing reliability and availability analysis methods from the viewpoint of software architecture. Our contribution is the comparison of the existing analysis methods and techniques that can be used for reliability and availability prediction at the architectural level. The objective is to discover which methods are suitable for the reliability and availability prediction of today's complex systems, what are the shortcomings of the methods, and which research activities need to be conducted in order to overcome these identified shortcomings. The comparison reveals that none of the existing methods entirely fulfill the requirements that are defined in the Communicated by Oystein Haugen.A. Immonen (B) · E. Niemelä

show abstract

“…We could find only a few approaches that address availability, such as [56], [57] and [58]. The approach presented in [56] is mainly concerned with architectural design and not analysis.…”

Section: Comparison Of the Selected Methodsmentioning

confidence: 99%

“…The approach presented in [56] is mainly concerned with architectural design and not analysis. The Laprie and Kanoun model [57] also addresses the problem of modeling reliability and availability with respect to various classes of faults.…”

Section: Comparison Of the Selected Methodsmentioning

confidence: 99%

Survey of reliability and availability prediction methods from the viewpoint of software architecture

2007

View full text Add to dashboard Cite

show abstract

“…More attention has been paid to modeling the characteristics of resource availability and many researches [5,[8][9][10][21][22][23] show that strong temporal and spatial correlations of failure events and resource failures follow the Weibull, Hyperexponential and Pareto distributions with different parameters rather than a Poisson distribution. Oliner et al [4] and Zhang et al [5] evaluate three application-level periodic checkpoint heuristics, checkpointing all jobs, long jobs and big jobs, in large-scale cluster system using temporal or spatial information of resource availability.…”

Section: Related Workmentioning

confidence: 99%

An optimistic checkpoint mechanism based on job characteristics and resource availability for dynamic grids

Tao

Jin

et al. 2011

Wuhan Univ. J. Nat. Sci.

View full text Add to dashboard Cite

In the paper, based on the job characteristics and resources availability, an optimistic checkpoint mechanism for dynamic grids(OCM4G) is proposed. It can determine whether to checkpoint a given job running on a given resource node and establish optimal aperiodic checkpoint intervals by applying the knowledge of job characteristics and resource availability. We evaluate OCM4G over a real grid environment (ChinaGrid) and the results show that OCM4G achieves better performance than the periodic checkpoint and the analytical method of calculating aperiodic checkpoint intervals.

show abstract

“…However, there has been little prior work on estimating arrivals to supercomputing clusters (although see [23], [24]). Instead we will use a technique that has been developed for estimating a different notion of "workload", namely the number of ongoing jobs.…”

Section: B Benchmark Modelsmentioning

confidence: 99%

Exploiting per user information for supercomputing workload prediction requires care

Dinh

Andrew

Branch

2013

2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing

View full text Add to dashboard Cite

Abstract-Efficient management of supercomputing facilities requires estimates of future workload based on past user behaviour. For supercomputers with large numbers of users, aggregate user behaviour is commonly assumed to be best in prediction of future workloads, however for systems with smaller numbers of users the question arises as to whether it is still suitable or if benefits can be derived from monitoring individual user behaviour to predict future workload. We compare using individual user behaviour, aggregate user behaviour and a hybrid approach where we track heavy users individually and cluster aggregate light users into a small number of clusters. We find that the hybrid approach produces the best results in both mean absolute error and mean squared error. However, treating all users separately provides slightly worse predictions. We also introduce a new approach to prediction based on the hazard function which is a significant improvement on previously used schemes based on autoregressive models. The schemes are investigated numerically using a two-year workload trace from a supercomputer with a population of 136 users.

show abstract

Availability prediction and modeling of high mobility OSCAR cluster

Cited by 32 publications

References 3 publications

Survey of reliability and availability prediction methods from the viewpoint of software architecture

Survey of reliability and availability prediction methods from the viewpoint of software architecture

An optimistic checkpoint mechanism based on job characteristics and resource availability for dynamic grids

Exploiting per user information for supercomputing workload prediction requires care

Contact Info

Product

Resources

About