Optimal clock period clustering for sequential circuits with retiming

Pan, Peichen; Karandikar, A.; Liu, C.L.

doi:10.1109/43.703830

Cited by 57 publications

(68 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since a gate (with only one output) only gives a directed tree with forbidden edges, Lemma 3 subsumes previous results [18,15,4]. However, our result also shows that it can be extended to circuits with complex blocks such as multipliers and adders where each output depends on all inputs.…”

Section: Lower and Upper Bounds Of Clock Periodsupporting

confidence: 63%

See 1 more Smart Citation

Retiming for wire pipelining in System-On-Chip

Lin

Zhou

2003

ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486)

View full text Add to dashboard Cite

At the integration scale of System-On-Chips (SOCs), the conflicts between communication and computation will become prominent even on a chip. A big fraction of system time will shift from computation to communication. In synchronous systems, a large amount of communication time is spent on multiple-clock period wires. In this paper, we explore retiming to pipeline long interconnect wires in SOC designs. Behaviorally, it means that both computation and communication are rescheduled for parallelism. The retiming is applied to a netlist of macro-blocks, where the internal structures may not be changed and flip-flops may not be able to be inserted on some wire segments. This problem is different from that on a gate level netlist and is formulated as a wire retiming problem. Theoretical treatment and a polynomial time algorithm are presented in the paper. Experimental results showed the benefits and effectiveness of our approach. 1 Introduction With a great market drive for high performance and integration, operating frequencies and chip sizes of SOCs are dramatically increasing. Industry data showed that the frequencies of high-performance ICs approximately doubled every process generation and the die size also increased by about 25% per generation. With such short clock periods, the communication among different blocks on a SOC circuit of ever increasing complexity is becoming a bottleneck: even with interconnect optimization techniques such as buffer insertion, the delay from one block to another may be longer than one clock period, and multiple clock cycles are generally required to communicate such a global signal.This trend has motivated recent research within Intel [2] and IBM [11] on how to insert flip-flops on a given net if the communication between the pins requires multiple clock cycles. However, inserting flip-flops within a circuit will change its functionality, and inserting arbitrary number of them on a net without considering global consistency will destroy the correctness of a circuit.Retiming [14] is a traditional sequential optimization technique that moves flip-flops within a circuit without destroying its functionality. In traditional settings, retiming was used only on gate level netlists and in most cases delays were dominated by gate delays-that is, wire delays were ignored. With increasing communication delays as mentioned above, this paper explores the alternative utility of retiming-that is, besides its computational function, a flip-flop can be used to fulfill communication buffering requirements.Since dominant wire delays can only happen on global wires, we solve the problem at the chip level, that is, the design we deal with is a netlist of macro-blocks. The wires within a block are relatively much shorter thus do not need multiple clock periods for propagation. In SOC design, many of these macro-blocks are IP (Intellectual Property) cores. Some of these blocks may be combinational circuits, and others sequential. In our problem formulation, we will use timing macro-models to mo...

show abstract

Section: Lower and Upper Bounds Of Clock Periodsupporting

confidence: 63%

“…Therefore, (7) can be incorporated into (6). It has been shown in the literature [18,15,4] that when the forbidden edges are introduced only by gates, the above lower bounds are tight. In fact, we can generalize the result to the following lemma.…”

Section: Lower and Upper Bounds Of Clock Periodmentioning

confidence: 99%

Retiming for wire pipelining in System-On-Chip

Lin

Zhou

2003

ICCAD-2003. International Conference on Computer Aided Design (IEEE Cat. No.03CH37486)

View full text Add to dashboard Cite

show abstract

“…This heuristic net weighting technique was developed for combinational timing-driven partitioning, but has been adopted by researchers looking at sequential-timing driven partitioning as well [Lim00,PKL98]. However, the complexity of partitioning under sequential flexibility was previously not made clear, and the justification for adopting such a heuristic technique was not based on theoretical grounds.…”

Section: Introductionmentioning

confidence: 99%

“…However, recent work has dealt with partitioning for performance in a sequential setting, allowing for retiming and clock skew scheduling to take place [Lim00,PKL98].…”

Section: Introductionmentioning

confidence: 99%

Integration of Physical Design and Sequential Optimization

Chong¹

2006

View full text Add to dashboard Cite

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. This work examines the interaction between the physical design of digital integrated circuits and sequential optimization techniques used for performance enhancement. In particular, the integration of floorplanning and placement with retiming and clock skew scheduling is explored. A theoretical result is given which addresses the computational complexity of circuit partitioning under constraints derived from sequential optimization; this motivates the need for heuristic approaches to the related placement problem. Another theoretical result provides a characterization of the feasible retimings of a sequential circuit; this result is used to motivate an effective method for floorplanning integrated with sequential optimization. Practical techniques for using sequential slack to drive standard-cell placement are shown here; experiments demonstrate significant improvement in final design performance using these methods. Another part of this work examines how the role of sequential optimization and physical design changes when the design allows for asynchronous or latency-insensitive communication between modules. A theoretical result relating to the problem of clock tree implementation for clock skew scheduling under process variation is given. Finally an experimental technique for floorplanning using nonlinear programming is demonstrated. All rights reserved.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Integration of Physical Design and Sequential

show abstract

“…It relocates registers to reduce cycle time while preserving the functionalities of circuits. Much effort has been made to apply this technique in different areas like power reduction [4], [5], testability [6], [7], logic resynthesis [8], circuit partitioning [9]- [11] and physical planning [12]. Some extended its applicability to large practical circuits efficiently [13]- [20].…”

Section: Introductionmentioning

confidence: 99%

Wire Retiming Problem With Net Topology Optimization

Tong

Young

Chu

et al. 2007

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Abstract-In this paper, we study the retiming problem of sequential circuits with net topology optimization. Both interconnect and gate delay are considered in retiming. Most previous retiming algorithms have assumed ideal conditions for the nonlogical portions of data paths, which are not sufficiently accurate to be used in high performance circuits today. In our modeling, we assume that the delay of a wire is directly proportional to its length. This assumption is reasonable since the quadratic component of a wire delay is significantly smaller than its linear component when the more accurate Elmore delay model is used. A simple experiment was conducted to illustrate the validity of this assumption. We present two approaches to solve the retiming problem, both of which have polynomial time complexity. The first one can compute the optimal clock period while the second one is an improvement over the first one in terms of practical applicability. The second approach gives solutions very close to the optimal (0.06% more than the optimal on average) but in a much shorter runtime. The optimally retimed circuit will then be realized physically by placing the registers and finding the net topologies. In contrast to many previous works [1], [2] that performed simple calculations to determine the register positions, our approach can preserve the optimal clock period obtained by the retiming step and utilize as few registers as possible. Minimization of register number saves both area and power in register and clock loading. Our topology optimization step is shown to be optimal for nets with four or fewer pins and this type of nets constitutes over 90% of the nets in a sequential circuit on average.Using the ISCAS89 benchmark, we tested our algorithm with a 0.35µm CMOS standard cell library. Silicon Ensemble was used to layout the design with row utilization of 50%. Experimental results showed that our algorithm could find the best sharing of registers for a net in most of the cases, i.e., using the minimum number of registers while preserving the target clock period obtained by the retiming step, within a minute run on an Intel Pentium IV 1.5GHz PC with 512MB RAM.

show abstract

Optimal clock period clustering for sequential circuits with retiming

Cited by 57 publications

References 27 publications

Retiming for wire pipelining in System-On-Chip

Retiming for wire pipelining in System-On-Chip

Integration of Physical Design and Sequential Optimization

Wire Retiming Problem With Net Topology Optimization

Contact Info

Product

Resources

About