As silicon technology scales, modern processors and embedded systems are rapidly shifting towards complex chip multi-processor (CMP) and system-on-chip (SoC) designs, comprising several processor cores and IP components communicating via a network-onchip (NoC). As a side-effect of this trend, ensuring their correctness has become increasingly problematic. In particular, the networkon-chip often includes complex features and components to support the required communication bandwidth among the nodes in the system. In this landscape, it is no wonder that design errors in the NoC may go undetected and escape into the final silicon, with potential detrimental impact on the overall system.In this work, we propose ForEVeR, a solution that complements the use of formal methods and runtime verification to ensure functional correctness in NoCs. Formal verification, due to its scalability limitations, is used to verify the smaller modules, such as individual router components. We complete the protection against escaped design errors with a runtime technique, a network-level error detection and recovery solution, which monitors the traffic in the NoC and protects it against escaped functional bugs that affect the communication paths in the network. To this end, ForEVeR augments the baseline NoC with a lightweight checker network that alerts destination nodes of incoming packets ahead of time. If a bug is detected, flagged by missed packet arrivals, a recovery mechanism delivers the in-flight data safely to the intended destination via the checker network. ForEVeR's experimental evaluation shows that it can recover from NoC design errors at only 4.8% area cost for an 8x8 mesh interconnect, with a recovery performance cost of less than 30K cycles per functional bug manifestation. Additionally, it incurs no performance overhead in the absence of errors.
With the advent of multicore processors and system-on-chip designs, intra-chip communication demands have exacerbated, leading to a growing adoption of scalable networks-on-chip (NoCs) as the interconnect fabric. Today, conventional NoC designs may consume up to 30% of the entire chip's power budget, in large part due to leakage power. In this work, we address this issue by proposing Panthre: our solution deploys power-gating to provide long intervals of uninterrupted sleep to selected units. Packets that would normally use power-gated components are steered away via topology and routing reconfiguration, while Panthre provides low-latency alternate paths to their destinations. The routing reconfiguration operates in a distributed fashion and guarantees that deadlock-free routes are available at all times. At runtime, Panthre adapts to the application's communication patterns by updating its power-gating decisions. It employs a feedback-based distributed mechanism to control the amount of sleeping components and of packets detours, so that performance degradation is kept at a minimum. Our design is flexible, providing a mechanism that designers can use to tradeoff power savings with performance, based on application's requirements.Our experiments on multi-programmed communication-light workloads from the SPEC CPU2006 suite show that Panthre reduces total network power consumption by 14.5% on average, with only a 1.8% degradation in performance, when all processor nodes are active. At times when 15-25% of the processor cores are communication-idle, Panthre enables leakage power savings of 36.9% on average, while still providing connected and deadlock-free routes for all other nodes.
Abstract-We propose a comprehensive yet low-cost solution for online detection and diagnosis of permanent faults in on-chip networks. Using error syndrome collection and packet/flit-counting techniques, highresolution defect diagnosis is feasible in both datapath and control logic of the on-chip network without injecting any test traffic or incurring significant performance overhead.
Abstract-The expected low reliability of the silicon substrate at upcoming technology nodes presents a key challenge for digital system designers. Networks-on-chip (NoCs) are especially concerning because they are often the only communication infrastructure for the chips in which they are deployed. Recently, routing reconfiguration solutions have been proposed to address this problem. However, they come at a high silicon cost, and often require suspending the normal network activity while executing a centralized, resource-hungry reconfiguration algorithm. This paper proposes a novel, fast and minimalistic routing reconfiguration algorithm, called BLINC. BLINC utilizes precomputed routing metadata to quickly evaluate localized detours upon each fault manifestation. We showcase the efficacy of our algorithm by deploying it in a novel NoC fault detection and reconfiguration solution, where BLINC enables uninterrupted NoC operation during aggressive online testing. If a fault seems likely to occur, we circumvent it in advance with the aid of our BLINC reconfiguration algorithm. Experimental results show an 80% reduction in the average number of routers affected by a reconfiguration event, compared to state-of-the-art techniques. BLINC enables negligible performance degradation in our detection and reconfiguration solution, while solutions based on current techniques suffer a 17-fold latency increase.
With the advent of multicore processors and system-on-chip designs, intra-chip communication demands have exacerbated, leading to a growing adoption of scalable networks-on-chip (NoCs) as the interconnect fabric. Today, conventional NoC designs may consume up to 30% of the entire chip's power budget, in large part due to leakage power. In this work, we address this issue by proposing Panthre: our solution deploys power-gating to provide long intervals of uninterrupted sleep to selected units. Packets that would normally use power-gated components are steered away via topology and routing reconfiguration, while Panthre provides low-latency alternate paths to their destinations. The routing reconfiguration operates in a distributed fashion and guarantees that deadlock-free routes are available at all times. At runtime, Panthre adapts to the application's communication patterns by updating its power-gating decisions. It employs a feedback-based distributed mechanism to control the amount of sleeping components and of packets detours, so that performance degradation is kept at a minimum. Our design is flexible, providing a mechanism that designers can use to tradeoff power savings with performance, based on application's requirements. Our experiments on multi-programmed communication-light workloads from the SPEC CPU2006 suite show that Panthre reduces total network power consumption by 14.5% on average, with only a 1.8% degradation in performance, when all processor nodes are active. At times when 15-25% of the processor cores are communication-idle, Panthre enables leakage power savings of 36.9% on average, while still providing connected and deadlock-free routes for all other nodes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.