2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) 2018
DOI: 10.1109/dac.2018.8465833
|View full text |Cite
|
Sign up to set email alerts
|

DNestMap: Mapping Deeply-Nested Loops on Ultra-Low Power CGRAs

Abstract: Coarse-Grained Reconfigurable Arrays (CGRAs) provide high performance, energy-efficient execution of the innermost loops of an application. Most real-world applications, however, comprise of deeply-nested loops with complex and often irregular control flow structures that cannot be mapped to CGRAs by existing compilers. This leads to excessive data transfer costs as the execution continuously alternates between the outer loop-nests on the host processor and the innermost loop on the CGRA accelerator. Moreover,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 15 publications
0
5
0
Order By: Relevance
“…However, the traditional modulo scheduling (software pipelining) [16] for kernel mapping fails to convert nested loops and complex if-else structures to static code for CGRAs. Some software pipelining solutions or mapping techniques have been proposed to solve this problem [17], [18], but this work focuses on solutions at the architecture level.…”
Section: Related Workmentioning
confidence: 99%
“…However, the traditional modulo scheduling (software pipelining) [16] for kernel mapping fails to convert nested loops and complex if-else structures to static code for CGRAs. Some software pipelining solutions or mapping techniques have been proposed to solve this problem [17], [18], but this work focuses on solutions at the architecture level.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, a homogeneous interconnect handles different data types (e.g., operands and predicates) the same way when ideally they should be treated differently to save energy. Local Memory : Configuration memory takes up significant power in architectures with spatial-temporal mapping given per-cycle reconfiguration [20]. Typically the opcode, constants, and router settings are encoded into a single configuration word in a homogeneous design.…”
Section: Homogeneity Leads To Inefficiencymentioning
confidence: 99%
“…A solution adopted in many cases is to let the control flow managed by a host processor. But this reduces greatly the pos- Spatial mapping [23], [30], [31] GA [19] SA [32], [33] ILP [23], [34], [35] Temporal mapping [12], [16], [26], [36]- [40] SA [22] ILP [41] B&B [42] CP [43] SAT [17] SMT [44] Binding [14], [24], [28], [45]- [47] QEA [48] SA [30], [49], [50] ILP [15], [48] Scheduling [24], [28], [36], [46], [48], [50]- [52] ILP [15], [53] sibilities to use the CGRA and increases the communication overhead, loosing sometimes the benefit of the acceleration provided by the CGRA. Another approach is to provide the CGRA with extra hardware features to support the control flow.…”
Section: B Control-flow Mappingmentioning
confidence: 99%