The CPM captures rising and falling edge delay on alternating Soraya G'iasi Tuyet Nguyn,NrmaJams,ichaclock cycles. The core CPM is 90x36pm2 and the nest CPM is 90x48gtm2 in 65nm SOI. There are 24 CPMs distributed across the IBM, Austin TX microprocessor ( Fig. 22.1.2): 8 in each core and 8 in the nest.Because of the time-to-digital nature of the output, its sensitivity Scaling has caused an inreas inpresvaron andithsnsi to multiple variables, and its distribution across the microprocestivity of cycle tile to workload and envlronemental conditions, sor, the CPM measures local power-supply droop, clock instability, making it difficult to predict the cycle tilme of lmicroprocessors. prcs vaitin NBIaderyaigefcs n eprtr , , , 1 , ,^.,~~~~~p rocess variation, NBTI and early aging effects, and temperature Cycle time is determined by the required performance with an in addition to timing; although, it is not always possible to sepaadded timing margin determined by the acceptable yield. After rate these effects. manufacture, microprocessors are binned into performance categories to account for process variation, but because of the influ- Figure 22.1.3 shows the measured delay versus voltage on nomience of the workload on cycle time, there is a danger of losing per-nal parts for the core CPM paths. The paths are normalized to formance with overly conservative timing margins. A critical-path demonstrate the different slopes of each delay path. There is some monitor (CPM) that measures critical-path delay and the effects of divergence in the pass-gate and wire delay paths from the MOS noise and localized VDD droops on timing is designed as part of the paths at the edges of the operating range. The small variation in POWER6 TM microprocessor. The CPM also measures across-chip POERMmirprocess or.riationt TealCe wealsou mea sacosm s-chi the wire-delay path demonstrates that for even large percentages process varIation anh rnd detectseartifm. u of wire delay, the MOS delay variation dominates the path delay. Figure 22.1.4 shows the average maximum frequency of the , microprocessor versus the measured bit position of the adder path the microprouessor,smanym difere timingy pathns my be criticeal, for the CPMs in core 0 at each voltage. The curve is generated byThe CPM uses a small number of delay paths with different delay running a heavy workload and increasing the frequency until failversus process, voltage, and temperature (DvPVT) curves to synure. If the CPM exactly tracks the critical path, the bit position at thesize the critical paths. It is a time-to-digital converter that uses failure should not change. There is an average of three bits of rise the system clock as the reference signal for conversion. The CPM, in the bit position as voltage rises, indicating the adder path does shown in Fig. 22.1.1, is composed of an edge-launching latch, not exactly match the critical path. None of the paths exactly delay-synthesis block, edge detector, data-analysis block, and contrack the critical path, but because the output is a the...
Modern embedded microprocessors use low power on-chip memories called scratch-pad memories to store frequently executed instructions and data. Unlike traditional caches, scratch-pad memories lack the complex tag checking and comparison logic, thereby proving to be efficient in area and power. In this work, we focus on exploiting scratch-pad memories for storing hot code segments within an application. Static placement techniques focus on placing the most frequently executed portions of programs into the scratch-pad. However, static schemes are inherently limited by not allowing the contents of the scratch-pad memory to change at run time. In a large fraction of applications, the instruction memory footprints exceed the scratch-pad memory size, thereby limiting the usefulness of the scratch-pad. We propose a compiler managed dynamic placement algorithm, wherein multiple hot code sequences, or traces, are overlapped with each other in the scratch-pad memory at different points in time during execution. Special copy instructions are provided to copy the traces into the scratch-pad memory at run-time. Using a power estimate, the compiler initially selects the most frequent traces in an application for relocation into the scratch-pad memory. Through iterative code motion and redundancy elimination, copy instructions are inserted in infrequently executed regions of the code. For a 64-byte code cache, the compiler managed dynamic placement achieves an average of 64% energy improvement over the static solution in a low-power embedded microcontroller.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.