1998
DOI: 10.1002/(sici)1097-024x(19980725)28:9<981::aid-spe182>3.0.co;2-x
|View full text |Cite
|
Sign up to set email alerts
|

Design, implementation and evaluation of ICARE: an efficient recoverable DSM

Abstract: SUMMARYIn the light of the increasing throughput of local area networks, Networks Of Workstations (NOWs) which provide a Distributed Shared Memory (DSM) have become a convenient and cheaper alternative to parallel architectures in the framework of parallel scientific applications. However, the probability that a failure occurs in such a system made up of a large number of components must not be neglected, especially for long-running applications. This paper presents the design, implementation and performance e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
20
0

Year Published

2000
2000
2004
2004

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 19 publications
0
20
0
Order By: Relevance
“…Care must be taken that the checkpoint is saved to a storage medium that can be accessed after a failure occurs. In [2], checkpoints are saved in the memory of another node rather than to disk thereby reducing the amount of time that is required to save the checkpoint. We have adopted this approach for our simulations.…”
Section: Fault Tolerant Dsm On the Some-busmentioning
confidence: 99%
See 2 more Smart Citations
“…Care must be taken that the checkpoint is saved to a storage medium that can be accessed after a failure occurs. In [2], checkpoints are saved in the memory of another node rather than to disk thereby reducing the amount of time that is required to save the checkpoint. We have adopted this approach for our simulations.…”
Section: Fault Tolerant Dsm On the Some-busmentioning
confidence: 99%
“…It is not necessary to recreate an entire copy of the memory at each checkpoint interval. Instead the checkpoints can be incrementally updated since it is only necessary to save the data that has been modified since the last checkpoint [2]. After the processors are synchronized in preparation for taking a checkpoint, all of the cache controllers write back any exclusively owned cache blocks.…”
Section: Fault Tolerant Dsm On the Some-busmentioning
confidence: 99%
See 1 more Smart Citation
“…Saving large amounts of data to disk can be extremely time consuming. Another approach that has been proposed [3] is to utilize the memory of another node to store checkpoints. This approach provides the necessary requirement to tolerate a single node failure, or multiple node failures where all copies of the checkpoint do not reside on the failed nodes.…”
Section: : Fault Tolerance and Distributed Shared Memory On The Somementioning
confidence: 99%
“…Much research is currently being conducted to use these mechanisms for cache coherence for the benefit of fault tolerance in the form of checkpointing [3][4] [5].…”
Section: : Fault Tolerance and Distributed Shared Memory On The Somementioning
confidence: 99%