Erika Gunadi scite author profile

Bias temperature instability, hot-carrier injection, and gate-oxide wearout will cause severe lifetime degradation in the performance and the reliability of future CMOS devices. The design guardband to counter these negative effects will be too expensive, largely due to the worst-case behavior induced by the uneven utilization of devices on the chip. To mitigate these effects over a chip's lifetime, this paper proposes Colt, a simple yet holistic scheme to balance the utilization of devices in a processor by equalizing the duty cycle ratio of circuits' internal nodes and the usage frequency of devices. Colt relies on alternating true-and complement-mode operations to equalize the duty cycle ratio of signals (thus the utilization of devices) in most data path and storage devices. Colt also employs a pseudorandom indexing scheme to balance the usage of entries in storage structures that often exhibit highly uneven utilization of entries. Finally, an operand-swapping scheme equalizes utilization of the left and right operand data paths. The proposed mechanisms impose trivial overhead in area, complexity, power, and performance, while recapturing 27% of aging-induced performance degradation and improving mean time to failure by an estimated 40%.

show abstract

Physical register inlining

Lipasti

Mestan²,

Gunadi

View full text Add to dashboard Cite

Physical register access time increases the delay between scheduling and execution in modern out-of-order processors. As the number of physical registers increases, this delay grows, forcing designers to employ register files with multicycle access. This paper advocates more efficient utilization of a fewer number of physical registers in order to reduce the access time of the physical register file. Register values with few significant bits are stored in the rename map using physical register inlining, a scheme analogous to inlining of operand fields in data structures. Specifically, whenever a register value can be expressed with fewer bits than the register map would need to specify a physical register number, the value is stored directly in the map, avoiding the indirection, and saving space in the physical register file. Not surprisingly, we find that a significant portion of all register operands can be stored in the map in this fashion, and describe straightforward microarchitectural extensions that correctly implement physical register inlining. We find that physical register inlining performs well, particularly in processors that are register-constrained.

show abstract

Physical Register Inlining

Lipasti

Mestan²,

Gunadi

2004

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

show abstract

Crib

Gunadi

Lipasti

2011

View full text Add to dashboard Cite

Conventional high-performance processors utilize register renaming and complex broadcast-based scheduling logic to steer instructions into a small number of heavily-pipelined execution lanes. This requires multiple complex structures and repeated dependency resolution, imposing a significant dynamic power overhead. This paper advocates in-place execution of instructions, a power-saving, pipeline-free approach that consolidates rename, issue, and bypass logic into one structure-the CRIB-while simultaneously eliminating the need for a multiported register file, instead storing architected state in a simple rank of latches. CRIB achieves the high IPC of an out-of-order machine while keeping the execution core clean, simple, and low power. The datapath within a CRIB structure is purely combinational, eliminating most of the clocked elements in the core while keeping a fully synchronous yet high-frequency design. Experimental results match the IPC and cycle time of a baseline outof-order design while reducing dynamic energy consumption by more than 60% in affected structures.

show abstract

A position-insensitive finished store buffer

Gunadi

Lipasti

2007

View full text Add to dashboard Cite

This paper presents the Finished Store Buffer (or FSB), an alternative and position-insensitive approach for building a scalable store buffer for an out-of-order processor. Exploiting the fact that only a small portion of in-flight stores are done executing (i.e. finished) and waiting for retirement, we are able to build a much smaller and more scalable store buffer. Our study shows that we only need at most half of the number of entries in a conventional store queue if we buffer only the stores that have finished execution. Entries in the store buffer are allocated at issue and disallocated on retirement. A clever encoder circuit is used to provide positional searches without an explicitly positional queue structure. While reducing the access latency and power consumption significantly, our technique has virtually no detrimental effect on per-cycle performance (IPC).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Erika Gunadi

Combating Aging with the Colt Duty Cycle Equalizer

Physical register inlining

Physical Register Inlining

Crib

A position-insensitive finished store buffer

Contact Info

Product

Resources

About