2012 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE) 2012
DOI: 10.1109/date.2012.6176621
|View full text |Cite
|
Sign up to set email alerts
|

Reli: Hardware/software Checkpoint and Recovery scheme for embedded processors

Abstract: Checkpoint and Recovery (CR) allows computer systems to operate correctly even when compromised by transient faults. While many software systems and hardware systems for CR do exist, they are usually either too large, require major modifications to the software, too slow, or require extensive modifications to the caching schemes. In this paper, we propose a novel error-recovery management scheme, which is based upon re-engineering the instruction set. We take the native instruction set of the processor and enh… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…Prior works [35,45] have also explored symptombased soft-error detection/recovery mechanisms, but they provide low soft-error coverage, since they rely on coarse-grain detectors, such as fatal-traps, hangs, panics, and so on. Under hardwarebased resilience schemes [35,39,45], the solutions enable redundancy mechanisms, such as TLR [4,11,17,27] or nMR [41,44] to provide soft-error protection. For instance, prior work [44] focuses on applying DMR on a multicore (GPU) setting, where it redundantly executes two copies of the same application, and delivers high soft-error coverage by performing cross checks in a duplicated thread.…”
Section: Related Workmentioning
confidence: 99%
“…Prior works [35,45] have also explored symptombased soft-error detection/recovery mechanisms, but they provide low soft-error coverage, since they rely on coarse-grain detectors, such as fatal-traps, hangs, panics, and so on. Under hardwarebased resilience schemes [35,39,45], the solutions enable redundancy mechanisms, such as TLR [4,11,17,27] or nMR [41,44] to provide soft-error protection. For instance, prior work [44] focuses on applying DMR on a multicore (GPU) setting, where it redundantly executes two copies of the same application, and delivers high soft-error coverage by performing cross checks in a duplicated thread.…”
Section: Related Workmentioning
confidence: 99%
“…Authors in [7] reconfigure the redundancy of functional units of a DSP processor into a m-way replication, however the execution-time of a program will be duplicated due to assigning some functional units to fault-tolerance. Recently, a hardware\software CR-based scheme, called Reli, has been proposed in [8] which is based on elaborating microinstructions with additional micro-operations to facilitate check-pointing.…”
Section: Related Workmentioning
confidence: 99%
“…The main novel feature of the presented recovery method is isolation of the faulty functional unit from the fault-free ones for one clock-cycle, referred to as freezing, and re-executing the faulty part of the instruction. Another novel feature is the minimum amount of information needed to be stored in each functional unit; this decreases the recovery overhead to only one clock-cycle, while a typical recovery mechanism takes 16 clock-cycles for the CR-based mechanism [8]. Moreover, the speed of the enriched processor is identical to the performance of the original processor, as long as no SET is present in the system.…”
Section: Recovery Mechanism In Combinational Logicsmentioning
confidence: 99%
“…A hardware/software approach for detecting and recovering from errors is proposed in [317]. The fundamental idea of this approach is to re-engineer the instruction set.…”
Section: Hybrid Approachmentioning
confidence: 99%