This paper presents the first in-depth study on applying dual V dd buffers to buffer insertion and multi-sink buffered tree construction for power minimization under delay constraint. To tackle the problem of dramatic complexity increment due to simultaneous delay and power consideration and increased buffer choices, we develop a sampling-based sub-solutions (i.e. options) propagation method and a balanced search tree-based data structure for option pruning. We obtain 17x speedup with little loss of optimality compared to the exact option propagation. Moreover, compared to buffer insertion with single V dd buffers, dual-V dd buffers reduce power by 23% at the minimum delay specification. In addition, compared to the delay-optimal tree using single V dd buffers, our power-optimal buffered tree reduces power by 7% and 18% at the minimum delay specification when single V dd and dual V dd buffers are used respectively.
For nanometer design, conventional timing analysis may generate over-optimistic results on criticality-dependent paths. A late arrival time at the data input of a flip-flop lengthens the propagation delay from the clock pin to the data output of this flip-flop, thus degrading the timing margins of paths launching from this flip-flop. To remove the optimism, in this paper, we first propose a simple yet effective triangle model to characterize the criticality-dependency effect. Then, we devise a novel criticality-dependency-aware timing analysis flow, which is seamlessly integrated with the common static timing analysis flow. Experimental results show that our approach can effectively analyze the criticality-dependency effect: Based on the proposed triangle model, we can accurately identify all timing-risky flipflops and capture the induced timing margin degradation.
In this paper we introduce the concept of zero-change transformations to quantify the suboptimality of existing placers. Given a netlist and its placement from a placer, we formally define a class of netlist transformations that produce different netlists from the given netlist but have the same Half-Perimeter Wire Length (HPWL). Furthermore, the optimal HPWL value of the new netlists is no less than that of the original netlist. By applying our transformations and re-executing the placer, we can interpret any deviation in HPWL as a lower bound to the deviation from the optimal HPWL value. Such deviation is a measure of suboptimality. Using these transformations, the suboptimality of several existing academic and industrial placers is studied on the IBM benchmarks. Our results show that current placers are suboptimal for zero-change transformations with deviations in HPWL by up to 32% on the IBM (version 1) benchmarks. The specific nature of our transformations also pinpoints possible directions for improvement in existing placers.
We streamline and extend APlace, the general analytic placement engine based on ideas of Naylor et al. [7] and described in [3,4,5]. Previous work explored the adaptability of APlace to multiple contexts with good quality of results. For example, the framework was extended to traditional wirelength-driven standard-cell placement in [3,5], achieving good results in placed HPWL and routed final wirelength. The framework was also extended to top-down multilevel placement, congestion-directed placement, mixed-size placement, timing-driven placement, I/O-core co-placement and constraint handling for mixed-signal contexts [3,4,5]. In this work, we have modified the implementation of APlace for speed and scalability. Improvements have been made in clustering, legalization and detailed placement strategies, as well as via a distributable solution framework for both global and detailed placement phases.
This work first presents an analytical repeater insertion method which optimizes power under delay constraint for a single net. This method finds the optimal repeater insertion lengths, repeater sizes, and V dd and V th levels for a net with a delay target, and it reduces more than 50% power over a previous work which does not consider V dd and V th optimization. This work further presents the power saving when multiple V dd and V th levels are used in repeater insertion at the full-chip level. Compared to the case with single V dd and V th suggested by ITRS, optimized dual V dd and dual V th reduce overall global interconnect power by 47%, 28% and 13% for 130nm, 90nm and 65nm technology nodes, respectively, but extra V dd or V th levels only give marginal improvement. We also show that an optimized single V th reduce interconnect power almost as effective as dual-V th does, in contrast to the need of dual V th for logic circuits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.