Space processing applications deployed on SRAMbased Field Programmable Gate Arrays (FPGAs) are vulnerable to radiation-induced Single Event Upsets (SEUs). Compared with the well-known SEU mitigation solution-Triple Modular Redundancy (TMR) with configuration memory scrubbing-TMR with module-based error recovery (MER) is notably more energy efficient and responsive in repairing soft-errors in the system. Unfortunately, TMR-MER systems also need to resort to scrubbing when errors occur in sub-components, such as nets, which are not recovered by MER. This paper addresses this problem by proposing a fine-grained module-based error recovery technique that without additional system hardware can localize and correct errors that classic MER fails to do. We evaluate our proposal via a fault-injection campaign on a Xilinx Artix-7 application circuit and compare the reliability, the error correction latency and the energy cost of repairing errors, of our proposal with those of a conventional MER approach and with periodic and on-demand blind scrubbing. We find the reliability of our proposal to be the highest and the energy expenditure to be the lowest amongst those methods considered.
High-reliability SRAM-based Field Programmable Gate Array (FPGA) designs that are deployed in space are commonly triplicated to mask Single Event Upsets (SEUs) and employ either scrubbing or modular reconfiguration to recover from radiation-induced configuration memory errors. Scrubbing benefits from vendor support and clears errors anywhere in the design but suffers from longer recovery times and higher energy use. Module-based error recovery is more energy efficient and responsive but repairs only corrupted TMR modules, leaving the supporting parts of the design such as pins or routing that are not included in the modules unrecovered. This paper proposes and assesses a hybrid technique we refer to as Frameand Module-based Error Recovery (FMER) that uses modular reconfiguration to repair faulty TMR modules and otherwise scrubs the supporting parts of the design. We derive and compare the reliability, availability and power consumption of TMR-based System on Chip (SoC) designs that incorporate FMER, modular reconfiguration alone, blind scrubbing and no recovery. Our results reveal that FMER has the highest reliability and availability of the studied techniques in high radiation environments or when a mission's energy budget is limited.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.