Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

Kim, Wonsub; Choi, Yoonseo; Park, Hae-woo

doi:10.1145/2555289.2555314

Cited by 2 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, it is not easy to map a computation-intensive Data Flow Graph (DFG) onto a Reconfigurable Cell Array (RCA), because there are many constraints. In previous studies, researchers have presented a wide range of mapping algorithms based on a variety of CGRAs [2][3][4][5][6][7][8][9][10][11][12][13][14] . Yoon et al [2] proposed the spatial mapping algorithm, known as Split-Push Kernel Mapping (SPKM), to map several applications onto resource sharing and pipelining architecture.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient scheduling mapping algorithm for row parallel coarse-grained reconfigurable architecture

Chen

Wang

et al. 2021

Tsinghua Sci. Technol.

View full text Add to dashboard Cite

Row Parallel Coarse-Grained Reconfigurable Architecture (RPCGRA) has the advantages of maximum parallelism and programmable flexibility. Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints. To solve this problem, the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA. In this paper, we present a Depth-First Greedy Mapping (DFGM) algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array (RCA). Compared with level breadth mapping, the performance of DFGM is better. The percentage of maximum improvement in the use times of RCA is 33% and the percentage of maximum improvement in non-original input and output times is 64.4% (Given Discrete Cosine Transfor 8 (DCT8), and the area of reconfigurable processing unit is 56). Compared with level-based depth mapping, DFGM also obtains the lowest averages of use times of RCA, non-original input and output times, and the reconfigurable time.

show abstract

Section: Introductionmentioning

confidence: 99%

“…Lee et al [8] and Jo et al [9] introduced approaches for supporting floating-point operations for CGRAs. Kim et al [10] proposed a fast modulo routing scheduling technique for mapping 3D graphics benchmarks onto CGRAs, which improved the compilation speed. Level-Breadth-Mapping (LBM) partitions the nodes by level.…”

Section: Introductionmentioning

confidence: 99%

Efficient scheduling mapping algorithm for row parallel coarse-grained reconfigurable architecture

Chen

Wang

et al. 2021

Tsinghua Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…As the basic blocks in a CDFG are represented as DFGs, any of the known DFG mapping onto CGRA [5,16,3,9] can be applied to map the basic blocks. Mapping of the basic blocks is done independently, since execution of basic blocks are mutually exclusive.…”

Section: A Problem Formulationmentioning

confidence: 99%

Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures

Das

Martin

Coussy

et al. 2017

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

View full text Add to dashboard Cite

In the approaching era of IoT, flexible and low power accelerators have become essential to meet aggressive energy efficiency targets. During the last few decades, Coarse Grain Reconfigurable Arrays (CGRA) have demonstrated high energy efficiency as accelerators, especially for high-performance streaming applications. While existing CGRAs mostly rely on partial and full predication techniques to support conditional branches, inefficient architecture and mapping support for handling control flow limits the use of CGRAs in accelerating either only inner loop bodies, or transformed loops specifically adapted to the target CGRA. This paper proposes a novel CGRA architecture with support for jump and conditional jump instructions and a lightweight global synchronization mechanism to enable complete Control Data Flow Graph (CDFG) mapping in an ultra-lowpower environment. The architecture is coupled with a complete design flow that efficiently maps applications with heavy control flow starting from a generic C language description. The proposed mapping approach reduces the impact of wasteful instruction issues in the conventional approaches of predication providing an average energy improvement of 1.44x and 1.6x when compared to the state of the art partial and full predication techniques. Moreover, the proposed method achieves an average speed-up up to 21x and an energy improvement up to 50.42x while executing applications with heavy control flow with respect to sequential execution on a low-power embedded CPU, demonstrating its suitability for next generation IoT applications.

show abstract

Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

Cited by 2 publications

References 13 publications

Efficient scheduling mapping algorithm for row parallel coarse-grained reconfigurable architecture

Efficient scheduling mapping algorithm for row parallel coarse-grained reconfigurable architecture

Efficient mapping of CDFG onto coarse-grained reconfigurable array architectures

Contact Info

Product

Resources

About