Occurrences of both transient and permanent errors pose a major challenge in the wake of burgeoning growth in transistor density. Manufacturing defects and process variants lead to permanent faults, thereby lowering processor yields. In the arithmetic logic unit, single permanent faults result in the absolute failure of processors. Both low-energy neutrons and alpha particles from the cosmos induce transient errors by altering the state of the transistor. In wake of the implications, we postulate a new, reliable, fault-tolerant, low-cost 32-bit one instruction core (OIC) for a multicore system. “Low cost” here means low power and lower area. Notably, 32-bit OIC provides fault-free execution with triple redundant subtractors with one additional subtractor. It only executes one instruction called subleq repetitively in order to emulate the faulty instructions migrated into it by other cores. Hardware synthesis is undertaken to estimate leakage power, dynamic power, critical path delay, and area. The low-power 32-bit OIC consumes 1.3 mW, with a die area of 8,122 μm2. Apart from also adding performance overhead, 32-bit OIC outperforms its competitors with regard to reliability, area, power, and critical path delay. Additionally, the probabilistic estimate with link vulnerability factor—as a new parameter—is introduced to assess the effect of soft errors or transient faults on interconnect wires, which are then quantitatively analyzed to illuminate the resilience of 32-bit OIC hardware structure. Additionally, we propose new design alternatives (including existing ones) for heterogeneous multicore systems in order to develop low-cost solutions with a primary focus on reliability.
Billions of transistors on a chip have led to integration of many cores leading to many challenges such as increased power dissipation, thermal dissipation, occurrence of faults in the circuits, and reliability issues. Existing approaches explore the usage of redundancy-based solutions for fault tolerance at core level, thread level, micro-architectural level, and software level. Core-level techniques improve the lifetime reliability of multi-core systems with asymmetric cores (large and small cores), which have gained momentum and focus among a large number of researchers. Based on the above implications, multi-core system using one instruction cores (MCS-OIC) factoring its features are proposed in this chapter. The MCS-OIC is an asymmetric multi-core architecture with MIPS core as the conventional core and OICs as the warm standby-redundant core. OIC executes only one instruction named ‘subleq _ subtract if less than or equal to zero’. When there is one of the functional units (i.e., ALU) of any conventional core fails, the opcode of the instruction is sent to the OIC. The OIC decodes the instruction opcode and emulates the faulty instruction by repeated execution of the ‘subleq’ instruction, thus providing fault tolerance. To evaluate the idea, the OIC is synthesized using ASIC and FPGA. Performance implications due to OICs at instruction and application level are evaluated. Yield analysis is estimated for various configurations of multi-core system using OICs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.