The industry is rapidly moving towards the adoption of Chip Multi-Processors (CMPs) of Simultaneous MultiThreaded (SMT) cores for general purpose systems. The most prominent use of such processors, at least in the near term, will be as job servers running multiple independent threads on the different contexts of the various SMT cores. In such an environment, the co-scheduling of phases from different threads plays a significant role in the overall throughput. Less throughput is achieved when phases from different threads that conflict for particular hardware resources are scheduled together, compared with the situation where compatible phases are co-scheduled on the same SMT core. Achieving the latter requires precise per-phase hardware statistics that the scheduler can use to rapidly identify possible incompatibilities among phases of different threads, thereby avoiding the potentially high performance cost of inter-thread contention.In this paper, we devise phase co-scheduling policies for a dual-core CMP of dual-threaded SMT processors. We explore a number of approaches and find that the use of ready and in-flight instruction metrics permits effective coscheduling of compatible phases among the four contexts. This approach significantly outperforms the worst static grouping of threads, and very closely matches the best static grouping, even outperforming it by as much as 7%.
on the affected circuit node. If the magnitude of this spike In this paper, we present a novel circuit design approach is sufficiently large, an erroneous value may be computed for radiation hardened digital electronics. Our approach is by the circuit. This is particularly problematic for memobased on the use of shadow gates, whose task it is to prories, which can flip their stored state as a result of such a tect the primary gate in case it is struck by a heavy cosmic radiation strike. Combinational logic may also be affected ion. We locally duplicate the gate to be protected, and conby such strikes, if the resulting glitch occurs at the time the nect a pair of transistors (or diodes) between the outputs circuit outputs are being sampled. Such bit reversals are reof the original and shadow gates. These transistors turn on ferred to as Single Event Upsets (SEUs) [12], or soft errors when the voltages of the two gates deviate during a radiation in the case of memory. strike. Our experiments show that at the level of a single The charge deposition rate is also referred to as the Linear gate, our circuit structure has a delay overhead of about 4% Energy Transfer (LET). Cosmic ions have varying LETs, on average, and an area overhead of over 100%. At the cirand they result in the deposition of a charge Q in a semicuit level, however, we do not need to protect all gates. We conductor diffusion region of depth t by the following forpresent a methodology to selectively protect specific gates of mula [11].the circuit in a manner that guarantees radiation tolerance for the entire circuit. With this methodology, we demon-Q= 0.01036-L t strate that at the circuzt level, the delay overhead is about 4% and the placed-and-routed area overhead is 30%, comHere L is the LET of the ion (expressed in MeV/cm2/mg), pae to an unrtce cicifr dea mape designs) t is the depth of the collection volume (expressed in microns), and Q is charge in pC. The amount of charge that Categories and Subject Descriptors: B.8.2 [Performance is required to cause a bit to be sampled incorrectly is reand Reliability]: Reliability, Testing, and Fault-Tolerance ferred to as the critical charge, Qc [13]. With diminishing General Terms: Design, Reliability process feature sizes and supply voltages, SEU problems are
The problem of determining bounds for application completion times running on generic systems comprised of single or multiple voltagefrequency islands (VFIs) with arbitrary topologies is addressed in the context of manufacturing-driven variability. The approach provides an exact solution for the system-level timing yield in single clock, single voltage (SSV) and VFI systems with an underlying tree-based topology, and a tight upper bound for generic, non-tree based topologies. The results show that: (a) timing yield for overall sourceto-sink completion time for generic systems can be modeled in an exact manner for both SSV and VFI systems; and (b) multiple VFI, latency-constrained systems can achieve 11-90% higher timing yield than their SSV counterparts. The results are proven formally and supported by experimental results on two embedded applications, namely software defined radio and MPEG2 encoder.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.