As scale and integration density of network-on-chip increase sharply, more transistors have been integrated into one chip. This unfortunately leads to more unexpected variations and faults in system. In particular, the transient errors and hardware permanent faults have rapidly become the key constraint for large-scale network design. This increasing tendency highlights the incorporation of fault-tolerant solutions for Network-onChip (NoC) architecture. In this paper we propose a REliable PArtIal-Redundancy-based router architecture (REPAIR). The proposed scheme merely utilizes an additional buffer and a bus to enhance the connectivity of the data path in router. Meanwhile, REPAIR also employs error control coding (ECC) modules and decision-table-based (DT) control logic to implement an efficient online diagnosis and reconfigurable mechanism respectively. The experimental results show the good ability of REPAIR to tolerate hard faults under a high fault rates. Specifically, the silicon protection factor (SPF) of individual router reaches 16.34 and over 95% packets still can be successfully transferred in 16×16 torus network with 650 faults.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.