1996
DOI: 10.1109/12.543705
|View full text |Cite
|
Sign up to set email alerts
|

An architecture for tolerating processor failures in shared-memory multiprocessors

Abstract: This paper focuses on the problem of fault tolerance in shared memory multiprocessors, and describes an architecture designed for transparently tolerating processor failures. The Recoverable Shared Memory (RSM) is the novel component of this architecture, providing a hardware supported backward error recovery mechanism which minimizes the propagation of recovery when a processor fails. The RSM permits a shared memory multiprocessor to be constructed using standard caches and cache coherence protocols, and does… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

1996
1996
2016
2016

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 27 publications
(21 citation statements)
references
References 28 publications
0
20
0
Order By: Relevance
“…While a line is in backup state its data is considered invalid and will be used only if required for recovery. Hence, the cache will not be able to read from that line 2 . Also, when a line enters in a backup state the lost data timeout will start and will stop once the backup state is abandoned.…”
Section: Avoiding Data Lossmentioning
confidence: 99%
See 1 more Smart Citation
“…While a line is in backup state its data is considered invalid and will be used only if required for recovery. Hence, the cache will not be able to read from that line 2 . Also, when a line enters in a backup state the lost data timeout will start and will stop once the backup state is abandoned.…”
Section: Avoiding Data Lossmentioning
confidence: 99%
“…Once C1 receives it, it transitions to a normal modified state. A cache line in a backup state will be used for recov- 2 It is possible for a cache to receive valid data and a token before abandoning a backup state, only if the data message was not lost. In that case, it will be able to read from that line and the line will be transitioned to an intermediate backup state until the ownership acknowledgement is received.…”
Section: Avoiding Data Lossmentioning
confidence: 99%
“…5. We only present the results for the static web server, but these results are qualitatively the same for all of our other workloads.…”
Section: Discussionmentioning
confidence: 90%
“…Logging due to transferring cache ownership, however, does not incur additional bandwidth, since the cache line must be read anyway. In Figure 6, for the static web server workload 5 , we plot this frequency as a function of the checkpoint interval. Both axes use log scales.…”
Section: Sensitivity Analysesmentioning
confidence: 99%
See 1 more Smart Citation