Architecture-level synthesis for automatic interconnect pipelining

Cong, Jason; Fan, Yiping; Zhang, Zhiru

doi:10.1145/996566.996731

Cited by 22 publications

(17 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first set of seven DFGs are extracted from MediaBench [16], then scheduled, bound, and placed in a 3×3 RDR-based architecture. The synthesis results for four different architecture-algorithm pairs, RDR/MCAS [7], RDR-Pipe/MCAS-Pipe [8], RDR-GRS/ILP [9] and RDR-GRS/RSS, are shown in Table I. The second and third column show the number of nodes and data transfers of the test case, respectively.…”

Section: B Experimental Resultsmentioning

confidence: 99%

“…Therefore, the number of required wires and register pairs in RDR/MCAS is lower-bounded by the number of the maximum possible concurrent data transfers at a cycle. Later, an extension named the RDR-Pipe/MCAS-Pipe is proposed in [8]. RDR-Pipe allows data transfers with the identical source-destination pair to share the same wires by inserting extra pipeline registers as intermediate stops.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Simultaneous data transfer routing and scheduling for interconnect minimization in multicycle communication architecture

Hong

Huang

2009

2009 Asia and South Pacific Design Automation Conference

View full text Add to dashboard Cite

In deep submicron technology, wire delay is no longer negligible and is gradually becoming a dominant factor of system performance. Several state-of-the-art architectural synthesis flows have already adopted the distributed register architecture to cope with the increasing wire delay by allowing multicycle communication. In this paper, we formulate channel and register allocation within a refined regular distributed register architecture, named RDR-GRS, as a problem of simultaneous data transfer routing and scheduling for minimizing global interconnect resources. We also present an innovative algorithm with both spatial and temporal considerations. It features both a concentration-oriented path router gathering wire-sharable data transfers and a channelbased time scheduler resolving contentions for wires in a channel, which are in spatial and temporal domain, respectively. The experimental results show that the proposed algorithm can significantly outperform existing related works. I IntroductionAs proceeding into the deep-submicron (DSM) technology era, interconnect delay is becoming inevitable due to resistancecapacitance delay, coupling effect, inductance, multiple-gigahertz operating frequency, and so on [1]. In architectural synthesis, the maximum sum of delay of both the functional units (FUs) and the associated wires decide the system speed. If the synthesis flow still neglects the delays introduced by long wires (especially for global interconnects), the serious impacts of long wires after physical floorplanning are very likely to worsen the whole system performance due to unexpected larger clock cycle time. To solve this problem, [2] [3][4] propose synthesis flows to estimate long interconnect delays more accurately by applying preliminary floorplanning and obtain better synthesis results.Typically, centralized register (CR) architecture is presumed in high-level synthesis. In a CR-based architecture, an FU is expected to access any register within one clock cycle. Though the device speed generally increases as the manufacturing process advances, the wire delay does not scale as well as the feature size. Consequently, global wire delay gradually dominates and significantly lengthens the cycle time. Hence, [5][6][7][8][9][10]propose distributed register (DR) architectures to overcome this issue. In a DR-based architecture, the whole system is partitioned into several clusters and each cluster contains its own local FUs and registers. As a result, the inter-cluster interconnect delay can be isolated from the intra-cluster delay. The latter includes the local wire delay within the same cluster and is supposed shorter than a single cycle, while the former is the global data transfer delay between different clusters and is allowed to be completed in multiple cycles. Accordingly, the DR architecture can not only alleviate the increase of cycle time due to the long wire delay but enable simultaneous computation and communication.Though allowing multicycle global data transfer can reduce the impact on system spe...

show abstract

Section: B Experimental Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Simultaneous data transfer routing and scheduling for interconnect minimization in multicycle communication architecture

Hong

Huang

2009

2009 Asia and South Pacific Design Automation Conference

View full text Add to dashboard Cite

show abstract

“…In fact, since the feature size of CMOS devices is continuously decreasing and more functionality is integrated on a chip, the length and number of global interconnects tend to increase [7]. Consequently, in future nanometer designs it will be impossible to carry signal across the chip within a single clock cycle and multi-cycle cross-chip communication becomes necessary, so that cross-chip interconnect is removed from all the timing constraints, and the chip speed is determined by the most critical intra-block/local combinational path, in order to continue employing higher frequencies [4], [5]. Insertion of sequential elements in interconnects lines -a concept that has become known as interconnect pipelining − is one feasible solution for modern nanometer technologies.…”

Section: Introductionmentioning

confidence: 99%

“…In [11], a floor-planning methodology, which considers interconnect pipelining and its impact on performance using the IPC sensitivity models is described. The authors of [5] explored the possibilities of sharing interconnect pipelining to reduce wiring overheads. And, [6] provides two techniques to deal with the short path constraint of latch based wire pipelining.…”

Section: Introductionmentioning

confidence: 99%

Analysis of Power Consumption and BER of Flip-flop Based Interconnect Pipelining

Roy

Chowdhury

2007

2007 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition

View full text Add to dashboard Cite

show abstract

“…Nanometer scale process technologies enable integration of billions of transistors with multiple-gigahertz operating frequencies [4]. However, continuous shift towards design in nanometer scale has been increasing the gap between device and wire delays, especially the global interconnect delays, which do not scale well with the feature size [8].…”

Section: Introductionmentioning

confidence: 99%

Bit Error Rate Analysis for Flip-flop and Latch Based Interconnect Pipelining

Chowdhury

2006

2006 13th IEEE International Conference on Electronics, Circuits and Systems

View full text Add to dashboard Cite

As integrated circuits technology enters into interconnect-centric nanometer regime, it will be impossible to carry cross-chip signals in a single clock cycle and interconnect pipelining becomes an acceptable solution beyond traditional buffer-insertion based interconnect systems. This paper performed a detailed analysis for the bit error rate (BER) of two kinds of interconnect pipelining approaches, and find that the BER is unusually high for some cases. Here the cause of the high BER has been analyzed, and a method to deal with it is proposed. A comparative study of the two interconnect pipelining approaches is also presented in this paper, which will help exploring trade-offs between number of sequential elements inserted and the probability of bit-error during data transmission.

show abstract

Architecture-level synthesis for automatic interconnect pipelining

Cited by 22 publications

References 12 publications

Simultaneous data transfer routing and scheduling for interconnect minimization in multicycle communication architecture

Simultaneous data transfer routing and scheduling for interconnect minimization in multicycle communication architecture

Analysis of Power Consumption and BER of Flip-flop Based Interconnect Pipelining

Bit Error Rate Analysis for Flip-flop and Latch Based Interconnect Pipelining

Contact Info

Product

Resources

About