Proceedings of the 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems - SIGMETRICS ' 2002
DOI: 10.1145/511361.511362
|View full text |Cite
|
Sign up to set email alerts
|

Improving cluster availability using workstation validation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

4
40
1

Year Published

2005
2005
2015
2015

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 23 publications
(45 citation statements)
references
References 0 publications
4
40
1
Order By: Relevance
“…From the large body of research already dedicated to modeling the availability of parallel and distributed computer systems-see [19,17,18,9] and the references within-, relatively little attention has been given to space-correlated errors and failures [19,4,11], despite their reported importance [8,17]. The main differences between this work and the previous work on space-correlated errors and failures is summarized in Table 12.…”
Section: Related Workmentioning
confidence: 98%
See 2 more Smart Citations
“…From the large body of research already dedicated to modeling the availability of parallel and distributed computer systems-see [19,17,18,9] and the references within-, relatively little attention has been given to space-correlated errors and failures [19,4,11], despite their reported importance [8,17]. The main differences between this work and the previous work on space-correlated errors and failures is summarized in Table 12.…”
Section: Related Workmentioning
confidence: 98%
“…Since the scale and complexity of contemporary distributed systems make the occurrence of failures the rule rather than the exception, many fault tolerant resource management techniques have been designed recently [8,3,17]. The deployment of these techniques and the design of new ones depend on understanding the characteristics of failures in real systems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Xu et al [40] performed a study of error logs collected from a heterogeneous distributed system consisting of 503 PC servers. Heath et al [13] considered failure data from three different clustered servers, ranging from 18 workstations to 89 workstations. Castillo et al [7], Iyer et al [15] and Meyer et al [24] have explored the effects of workload on different types of computer system failures.…”
Section: Failure Injectionmentioning
confidence: 99%
“…This includes the fitting of failure data to Weibull, lognormal and other specific distributions, each with different parameter settings, under the assumption of independent and identically distributed failures [20,19,13]. Other studies have demonstrated that the sequence of failures on some computer systems are correlated in various ways and that the failures tend to occur in bursts [37,36,40,28].…”
Section: Failure Injectionmentioning
confidence: 99%