18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.
DOI: 10.1109/ipdps.2004.1303244
|View full text |Cite
|
Sign up to set email alerts
|

A fault tolerant protocol for massively parallel systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
22
0

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 26 publications
(22 citation statements)
references
References 18 publications
0
22
0
Order By: Relevance
“…In order to calculate the parameter ( ; ) X P X S , we should enumerate the number of all existing paths facing the − X shape fault pattern and divide them by the number of all existing paths in the connected × R C torus network. This probability is expressed formally as hit The number of minimal paths crossing the fault region P The number of all minimal paths existing in the network (1) The following theorem provides the total number of paths with minimal length in the network. ( , )…”
Section: Remark: a Path Facing The Fault-pattern Means That There Eximentioning
confidence: 99%
See 1 more Smart Citation
“…In order to calculate the parameter ( ; ) X P X S , we should enumerate the number of all existing paths facing the − X shape fault pattern and divide them by the number of all existing paths in the connected × R C torus network. This probability is expressed formally as hit The number of minimal paths crossing the fault region P The number of all minimal paths existing in the network (1) The following theorem provides the total number of paths with minimal length in the network. ( , )…”
Section: Remark: a Path Facing The Fault-pattern Means That There Eximentioning
confidence: 99%
“…To be able to adapt with faults without serious degradation of the service, networks and routing protocols have to be set up so that they are fault-tolerant. Several recent studies address faulttolerance in a diverse range of systems and applications [1][2][3][4][5][6][7][8][9][10][11][12]. Almost all of the performance evaluation studies for functionality of these systems, however, have made use solely of simulation experiments.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, many researchers have addressed to several issues in the field of fault-tolerance and reliability analysis of large scale parallel and distributed systems [4][5][6][7][8][9][10][11][12][13][14][15][16]. These researches span a diverse range of systems and applications such as massively parallel processors [8], cluster-based systems [9], mobile systems [10], sensor networks [11], and more recently network on chip [1].…”
Section: Introductionmentioning
confidence: 99%
“…These researches span a diverse range of systems and applications such as massively parallel processors [8], cluster-based systems [9], mobile systems [10], sensor networks [11], and more recently network on chip [1].…”
Section: Introductionmentioning
confidence: 99%
“…Charm++ consists of a variety of broadly applicable high-performance tools integrated in a single run-time system. Virtualization techniques are employed for hiding latency via message-driven execution [2], automatic applicationindependent load balancing [3], automatic communication optimization [4], check-pointing [5], fault tolerance [6,7], and performance visualization and analysis [8]. All of these tools help make a parallel code run better, but even with Charm++, developing a new parallel program still requires many hours of effort.…”
Section: Introductionmentioning
confidence: 99%