Transient faults became an increasing issue in the past few years as smaller geometries of newer, highly miniaturized, silicon manufacturing technologies brought to the mass-market failure mechanisms traditionally bound to niche markets as electronic equipments for avionic, space or nuclear applications. This chapter presents the origin of transient faults, it discusses the propagation mechanism, it outlines models devised to represent them and finally it discusses the state-of-the-art design techniques that can be used to detect and correct transient faults. The concepts of hardware, data and time redundancy are presented, and their implementations to cope with transient faults affecting storage elements, combinational logic and IP-cores (e.g., processor cores) typically found in a System-on-Chip are discussed.Single Event Transient (SET) is the not-destructive event that takes place when the parasitic current produces glitches on the values of nets in the circuit compatible with the noise margins of the technology, thus result in the temporary modification of the value of the nets from 0 to 1, or vice-versa. Among SEEs, SEL is the most worrisome, as it corresponds to the destruction of the device, and hence it is normally solved by means of SEL-aware layout of silicon cells, or by current sensing and limiting circuits. SEUs, MBUs, and SETs can be tackled in different ways, depending on the market the application aims at. When vertical, high-budget, applications are considered, like for example electronic devices for telecom satellites, SEE-immune manufacturing technologies can be adopted, which are byconstruction immune to SEUs, MBUs, and SETs, but whose costs are prohibitive for any other market. When budget-constrained applications are considered, from electronic devices for space exploration missions to automotive and commodity applications, SEUs, MBUs and SETs should be tackled by adopting fault detection and compensation techniques that allow developing dependable systems (i.e., where SEE effects produce negligible impacts on the application end user) on top of intrinsically not dependable technologies (i.e., which can be subject to SEUs, MBUs, and SETs), whose manufacturing costs are affordable. Different types of fault detection and compensation techniques have been developed in the past years, which are based on the well-known concepts of resource, information or time redundancy (Pradhan, 1996). In this chapter we first look at the source of soft errors, by presenting some background on radioactive environments, and then discussing how soft errors can be seen at the device level. When then present the most interesting mitigation techniques organized as a function of the component they aims at: processor, memory module, and random logic. Finally, we draw some conclusions.
BACKGROUNDThe purpose of this section is to present an overview of the radioactive environments, to introduce the reader to the physical roots of soft errors. Afterwards, SEEs resulting from the interaction of ionizing radiation with th...