Abstract-Circuit degradation due to bias temperature instability (BTI) can lead to timing failures in digital circuits. We develop variable latency unit (VLU) based BTI-aware designs, with a novel scheme for multioutput hold logic implementation for VLUs. A key observation is the identification and exploitation of specific supersetting patterns in the two-dimensional space of frequency and aging of the circuit. The multioutput hold logic scheme is used in conjunction with an adaptive body bias framework to achieve high performance, allowing the design to be easily incorporated in traditional synthesis flows. As compared to conventional combinational BTI-resilience scheme, our design achieves an area reduction of 9.2%, with a significant throughput enhancement of 30.0%. §1. INTRODUCTION Bias Temperature Instability (BTI) [1], in the form of negative BTI (NBTI) in PMOS and positive BTI (PBTI) in NMOS transistors, is a significant concern in nanoscale circuits. BTI causes the transistor threshold voltage to shift over time, and the resulting increases in delay could cause a circuit to fail timing specifications as it ages.Published approaches for enhancing BTI-resiliency include transistor sizing, logic resynthesis, or postsilicon tuning. These methods are built for conventional synchronous designs, where the worstcase delay determines the clock period. This work addresses the case where the clock period is based on the notion of average-case computations rather than the worst-case computations in a circuit, an approach that leads to improved data throughput.Within the synchronous paradigm, two classes of techniques have been proposed for exploiting the average-case computations: variablelatency units [2]-[4], and error detection-correction units [5]. Our work focuses on the design of BTI-resilient circuits using variable latency units (VLUs). Unlike conventional combinational circuits that complete operations within one clock cycle, VLUs allow the computation of the combinational circuit to be completed in a variable, integer, number of clock cycles. By allowing high-probability operations to complete in a single cycle, but allowing rarer events to use multiple (typically two) cycles, the average cycle time may be shorter than that of the conventional implementation, implying that the circuit throughput for a VLU may be significantly larger.As an illustration of a VLU, consider the 6-bit ripple carry adder (RCA) shown in Figure 1, with six full adders. Assuming unit gate delays, the conventional single-cycle fixed-latency combinational circuit has a cycle time, T clk = 13 units, equal to the delay of its longest path, corresponding to a throughput, η1 = 1/13. The VLU implementation of this adder operates at a reduced cycle time, T clk < 13. For T clk = 9, assuming that all primary input signals are mutually independent and have signal probabilities of 50%, 18.75% of the input patterns violate T clk , and the VLU allows these to complete execution in two cycles. Under the 50% assumption above, each pattern is equip...