Proceedings of the 2007 ACM/IEEE Conference on Supercomputing 2007
DOI: 10.1145/1362622.1362687
|View full text |Cite
|
Sign up to set email alerts
|

Performance under failures of high-end computing

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 29 publications
(23 citation statements)
references
References 13 publications
0
23
0
Order By: Relevance
“…Requests in service are subject to failures, with failure rate α. Both service and failure times are exponentially distributed, a common assumption in reliability engineering [24,23]. In case of a failure, the request currently in service is lost, but the server itself is not affected, and continues to serve the next request that enters.…”
Section: Reference Modelmentioning
confidence: 99%
“…Requests in service are subject to failures, with failure rate α. Both service and failure times are exponentially distributed, a common assumption in reliability engineering [24,23]. In case of a failure, the request currently in service is lost, but the server itself is not affected, and continues to serve the next request that enters.…”
Section: Reference Modelmentioning
confidence: 99%
“…This is a common assumption [10,48], and an appropriate starting case. It is not a universal assumption however [38,51], and we address alternate distributions in Sections 5 and 6.…”
Section: Exponentially-distributed Node Failuresmentioning
confidence: 99%
“…However, checkpoint sizes are increasing faster than checkpoint bandwidths [38]. It has been shown that the collision of these trends will render Exascale systems as "useless" due to checkpoint/restart overheads [12], and thus it is time for new reliability strategies to be explored [38,13,48].…”
Section: Introductionmentioning
confidence: 99%
“…In [27], we have presented a performance model to estimate the mean, variance and distribution of a single sequential task computation time. We adopt this model to estimate the computation time of each subtask in the DAG.…”
Section: B Modeling Of Subtask Computation Timementioning
confidence: 99%
“…We first predict the performance of subtasks based on our previous work [27], in which all subtasks are independent. This prediction provides the prediction of subtasks under one layer of a general DAG.…”
Section: Introductionmentioning
confidence: 99%