Robert M. Senger scite author profile

The CPM captures rising and falling edge delay on alternating Soraya G'iasi Tuyet Nguyn,NrmaJams,ichaclock cycles. The core CPM is 90x36pm2 and the nest CPM is 90x48gtm2 in 65nm SOI. There are 24 CPMs distributed across the IBM, Austin TX microprocessor ( Fig. 22.1.2): 8 in each core and 8 in the nest.Because of the time-to-digital nature of the output, its sensitivity Scaling has caused an inreas inpresvaron andithsnsi to multiple variables, and its distribution across the microprocestivity of cycle tile to workload and envlronemental conditions, sor, the CPM measures local power-supply droop, clock instability, making it difficult to predict the cycle tilme of lmicroprocessors. prcs vaitin NBIaderyaigefcs n eprtr , , , 1 , ,^.,~~~~~p rocess variation, NBTI and early aging effects, and temperature Cycle time is determined by the required performance with an in addition to timing; although, it is not always possible to sepaadded timing margin determined by the acceptable yield. After rate these effects. manufacture, microprocessors are binned into performance categories to account for process variation, but because of the influ- Figure 22.1.3 shows the measured delay versus voltage on nomience of the workload on cycle time, there is a danger of losing per-nal parts for the core CPM paths. The paths are normalized to formance with overly conservative timing margins. A critical-path demonstrate the different slopes of each delay path. There is some monitor (CPM) that measures critical-path delay and the effects of divergence in the pass-gate and wire delay paths from the MOS noise and localized VDD droops on timing is designed as part of the paths at the edges of the operating range. The small variation in POWER6 TM microprocessor. The CPM also measures across-chip POERMmirprocess or.riationt TealCe wealsou mea sacosm s-chi the wire-delay path demonstrates that for even large percentages process varIation anh rnd detectseartifm. u of wire delay, the MOS delay variation dominates the path delay. Figure 22.1.4 shows the average maximum frequency of the , microprocessor versus the measured bit position of the adder path the microprouessor,smanym difere timingy pathns my be criticeal, for the CPMs in core 0 at each voltage. The curve is generated byThe CPM uses a small number of delay paths with different delay running a heavy workload and increasing the frequency until failversus process, voltage, and temperature (DvPVT) curves to synure. If the CPM exactly tracks the critical path, the bit position at thesize the critical paths. It is a time-to-digital converter that uses failure should not change. There is an average of three bits of rise the system clock as the reference signal for conversion. The CPM, in the bit position as voltage rises, indicating the adder path does shown in Fig. 22.1.1, is composed of an edge-launching latch, not exactly match the critical path. None of the paths exactly delay-synthesis block, edge detector, data-analysis block, and contrack the critical path, but because the output is a the...

show abstract

Looking under the hood of the IBM Blue Gene/Q network

Chen

Eisley

Heidelberger

et al. 2012

View full text Add to dashboard Cite

Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

Ravindran

Nagarkar

Dasika

et al.

View full text Add to dashboard Cite

Modern embedded microprocessors use low power on-chip memories called scratch-pad memories to store frequently executed instructions and data. Unlike traditional caches, scratch-pad memories lack the complex tag checking and comparison logic, thereby proving to be efficient in area and power. In this work, we focus on exploiting scratch-pad memories for storing hot code segments within an application. Static placement techniques focus on placing the most frequently executed portions of programs into the scratch-pad. However, static schemes are inherently limited by not allowing the contents of the scratch-pad memory to change at run time. In a large fraction of applications, the instruction memory footprints exceed the scratch-pad memory size, thereby limiting the usefulness of the scratch-pad. We propose a compiler managed dynamic placement algorithm, wherein multiple hot code sequences, or traces, are overlapped with each other in the scratch-pad memory at different points in time during execution. Special copy instructions are provided to copy the traces into the scratch-pad memory at run-time. Using a power estimate, the compiler initially selects the most frequent traces in an application for relocation into the scratch-pad memory. Through iterative code motion and redundancy elimination, copy instructions are inserted in infrequently executed regions of the code. For a 64-byte code cache, the compiler managed dynamic placement achieves an average of 64% energy improvement over the static solution in a low-power embedded microcontroller.

show abstract

The IBM Blue Gene/Q Interconnection Fabric

Chen¹,

Eisley²,

Heidelberger

et al. 2012

IEEE Micro

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Robert M. Senger

The IBM Blue Gene/Q interconnection network and message unit

A Distributed Critical-Path Timing Monitor for a 65nm High-Performance Microprocessor

Looking under the hood of the IBM Blue Gene/Q network

Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

The IBM Blue Gene/Q Interconnection Fabric

Contact Info

Product

Resources

About