38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05)
DOI: 10.1109/micro.2005.8
|View full text |Cite
|
Sign up to set email alerts
|

A Mechanism for Online Diagnosis of Hard Faults in Microprocessors

Abstract: We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field IntroductionAs technological trends continue to lead toward smaller device and wire dimensions in integrated circuits, the probability of hard (permanent) faults in microprocessors increases. These faults may be introduced during fabrication, as defects, or they may occur during the operational lifetime of the microprocessor. Well-known physical phenomena that lead to operational hard faults are gate oxide… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0
1

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 84 publications
(55 citation statements)
references
References 33 publications
0
54
0
1
Order By: Relevance
“…An existing scheme that can be used for this purpose is the hard-fault detection and diagnosis framework described in [36], involving a low-cost hardware checker [38] and saturating counters. This section provides a brief outline of the methodology in [36] for the sake of completeness. The work here has no contributions towards this end.…”
Section: Architectural Pre-requisitesmentioning
confidence: 99%
See 1 more Smart Citation
“…An existing scheme that can be used for this purpose is the hard-fault detection and diagnosis framework described in [36], involving a low-cost hardware checker [38] and saturating counters. This section provides a brief outline of the methodology in [36] for the sake of completeness. The work here has no contributions towards this end.…”
Section: Architectural Pre-requisitesmentioning
confidence: 99%
“…If an instruction result is found to be erroneous, the faulty FDU in use is recorded by incrementing a saturating counter corresponding to each and every FDU used by the instruction. If the fault-count for an FDU rises beyond a threshold within a pre-specified time interval, the fault in that unit is considered to be permanent [36]. Experimental results indicate that most hard faults can be suitably detected and diagnosed within a few thousand instructions after the faults develop.…”
Section: Online Detection and Diagnosismentioning
confidence: 99%
“…One proposal for run-time isolation is BlackJack [20], which exploits simultaneously-redundant threads on an SMT, previously used to detect soft errors, to detect defects. Bower et al in [3] propose using DIVA-checkers, small auxiliary cores that check committed instructions [1], for defect isolation. Constantinides et al in [6] propose a virtualization layer between the operating system and the hardware to introduce periodic special instructions for defect isolation.…”
Section: Defect Detection and Isolationmentioning
confidence: 99%
“…The final phase (Step 3) of the test routine uses the ACE get instruction to read and validate the test response from the scan state. If a test pattern fails to produce the correct response at the end of Step 3, the test program indicates which part of the hardware is defective 5 and disables it through system reconfiguration [27,8]. If necessary, the test program can run additional test patterns to narrow down the defective part to a finer granularity.…”
Section: Ace-based Online Testingmentioning
confidence: 99%