2019
DOI: 10.1145/3302255
|View full text |Cite
|
Sign up to set email alerts
|

A Survey on Multithreading Alternatives for Soft Error Fault Tolerance

Abstract: Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce higher soft error rates. This trend makes reliability a primary design constraint for computer systems. Redundant multithreading (RMT) makes use of parallelism in modern systems by employing thread-level time redundancy for fault detection and recovery. RMT can detect faults by running identical copies of the program as separate threads in parallel execution units with identical inputs and comparing their outputs. In this… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
20
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(20 citation statements)
references
References 80 publications
0
20
0
Order By: Relevance
“…But, the addition of these diagnosis techniques in the hardware, can also have an impact on silicon die and lead to cost increase, power overheads and reduced performance. So, the combination of hardware and software techniques combined at different abstraction levels are usually required to detect complex hardware faults such as MBU and transient faults, as supported hardware techniques alone might not be sufficient [34,39,56,48,36]. However, cross-requirements research contributions that aim at reconciling reliability, DC and time predictability are still scarce (e.g., [57,58]).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…But, the addition of these diagnosis techniques in the hardware, can also have an impact on silicon die and lead to cost increase, power overheads and reduced performance. So, the combination of hardware and software techniques combined at different abstraction levels are usually required to detect complex hardware faults such as MBU and transient faults, as supported hardware techniques alone might not be sufficient [34,39,56,48,36]. However, cross-requirements research contributions that aim at reconciling reliability, DC and time predictability are still scarce (e.g., [57,58]).…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, devices targeting industrial markets with high reliability requirements are prone to include such features as oppose to high volume markets that have lower reliability requirements [38]. Because of this, application specific or generic software fault-tolerance techniques can be used in order to improve hardware reliability, e.g., multithreading for soft error fault-tolerance [56], algorithmic based fault tolerance [46], software redundant execution (e.g., software application, replicated instructions and check inserted by the compiler) [46], combination of software and hardware techniques with WCET guarantees [138], software level thermal co-management [36,137], application specific techniques (e.g., [4,6]).…”
Section: Reliabilitymentioning
confidence: 99%
“…With its own advantages, the DDS has gradually developed into the preferred storage system for various types of data storage today [2]. When the dynamic distributed storage system faces the loss of disk data, it is necessary to introduce a redundant fault-tolerant mechanism to ensure that the system can work normally [3]. In order to provide users with reliable file storage services, dynamic distributed storage systems need to adopt fault-tolerant technology to improve file availability [4][5].…”
Section: Introductionmentioning
confidence: 99%
“…The extra process-ing units of these systems can be used to accelerate software computation strategies aimed at mitigating or detecting soft errors, nevertheless, the majority of the existing software protection techniques are not designed to take advantage of these resources. Recent works, such as [8], point to distribute the replica computation across different processing units as a way to gain efficiency without losing reliability. Simultaneous Muti-Threading (SMT) is a long ago defined concept that has been used along with the concept of Sphere-of-Replication (SoR) to achieve reliability [9].…”
Section: Introductionmentioning
confidence: 99%