Continuous technology scaling in semiconductor industry makes the system reliability as a serious concern in the area of nanoscale computing. In this paper, a fully adaptive routing algorithm is proposed to overcome faults in NoCs (Network-on-Chip). This algorithm called DINRA-NoC (DIstiributed and New Routing Algorithm for NoC) is distributed, fault tolerant and congestion-aware. First, each node selects the appropriate output to route packets to neighbor routers according to the state of each link and router. Secondly, the proposed routing algorithm takes in account the status of adjacent routers traffic to update the congestion metric. DINRA-NoC does not use any VCs (Virtual Channels) and is deadlock-free. A simulation of the proposed routing algorithm has been carried out using Noxim simulator. The results show that DINRA ensures a good reliability rate despite of the presence of many faulty routers/links. In other hand, the simulation results indicate that the performance of the proposed routing algorithm surpasses the performance of existing algorithms in terms of lowering the congestion, improving average latency and increasing throughput.
Abstract-Many fault tolerance techniques have been proposed in Network on Chip to cope with defects during fabrication or faults during product lifetime. Fault tolerance routing algorithm provide reliable mechanisms for continue delivering their services in spite of defective nodes due to the presence of permanent and/or transient faults throughout their lifetime implementation. This paper presents a new approach in the domain of fault-tolerant NoC with two main contributions. Firstly, we consider a unified fault model that include transient faults, permanent faults and congestion considered as a fault. Secondly, we present a new architecture based on sub-nets and give an overview of the associated test and (re)routing algorithm. The main result of this paper, is a new routing algorithm called Collaborative Routing Algorithm for Fault Tolerance in Network on Chip (CRAFT-NoC). We compare our approach with ACO-FAR that considers as well congestion and permanent faults. Our simulation results show significant improvements in terms of both latency and reliability.
The Network-on-Chip (NoC) has become a promising communication infrastructure for Multiprocessors-System-on-Chip (MPSoC). Reliability is a main concern in NoC and performance is degraded when NoC is susceptible to faults. A fault can be determined as a cause of deviation from the desired operation of the system (error). To deal with these reliability challenges, this work propose OFDIM (Online Fault Detection and Isolation Mechanism),a novel combined methodology to tolerate multiple permanent and transient faults. The new router architecture uses two modules to assure highly reliable and low-cost fault-tolerant strategy. In contrast to existing works, our architecture presents less area, more fault tolerance, and high reliability. The reliability comparison using Silicon Protection Factor (SPF), shows 22-time improvement and that additional circuitry incurs an area overhead of 27%, which is better than state-of-the-art reliable router architectures. Also, the results show that the throughput decreases only by 5.19% and minor increase in average latency 2.40% while providing high reliability.
To provide correct data transmission and to handle the communication requirements, the routing algorithm should find a new path to steer packets from the source to the destination in a faulty network. Many solutions have been proposed to overcome faults in network-on-chips (NoCs). This article introduces a new fault-tolerant routing algorithm, to tolerate permanent and transient faults in NoCs. This solution called DINRA can satisfy simultaneously congestion avoidance and fault tolerance. In this work, a novel approach inspired by Catnap is proposed for NoCs using local and global congestion detection mechanisms with a hierarchical sub-network architecture. The evaluation (on reliability, latency and throughput) shows the effectiveness of this approach to improve the NoC performances compared to state of art. In addition, with the test module and fault register integrated in the basic architecture, the routers are able to detect faults dynamically and re-route packets to fault-free and congestion-free zones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.