2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA) 2021
DOI: 10.1109/hpca51647.2021.00042
|View full text |Cite
|
Sign up to set email alerts
|

Ultra-Elastic CGRAs for Irregular Loop Specialization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 35 publications
(5 citation statements)
references
References 47 publications
0
5
0
Order By: Relevance
“…Like recent near-data computing architectures [6,83,105,142,150], täkō adds programmable engines near caches to execute callbacks efficiently. In täkō, engines contain scheduling logic and a spatial dataflow fabric to run callbacks [43,59,103,132,138,143]. With this microarchitectural support, täkō gets close to the performance of fully specialized hardware -software programmability adds little overhead because data movement costs dominate and callbacks are short.…”
Section: Cachementioning
confidence: 99%
“…Like recent near-data computing architectures [6,83,105,142,150], täkō adds programmable engines near caches to execute callbacks efficiently. In täkō, engines contain scheduling logic and a spatial dataflow fabric to run callbacks [43,59,103,132,138,143]. With this microarchitectural support, täkō gets close to the performance of fully specialized hardware -software programmability adds little overhead because data movement costs dominate and callbacks are short.…”
Section: Cachementioning
confidence: 99%
“…Another notable work with architectural support for optimizing loop execution is the ultra-elastic CGRAs (UE-CGRAs) [24] that can efficiently execute loops with irregular control flow and memory accesses and inter-iteration loop dependencies. The solution co-designed across compiler, architecture, and VLSI accelerates true-dependency bottlenecks and reduces energy consumption by supporting fine-grain dynamic voltage and frequency scaling (DVFS) on individual PEs.…”
Section: Related Workmentioning
confidence: 99%
“…Case studies on processor architectures [1,12,15,25] reveal that improved performance and energy efficiency can be attained when loop-specific hardware optimizations are applied. Few recent CGRA architectures have come up with such architectural modifications to better support loop execution and reported good results [2,22,24,26].…”
Section: Introductionmentioning
confidence: 99%
“…With improvements to general purpose processors slowing, reconfigurable accelerators (aka. dataflow accelerators [30, 41, 43, 45, 61ś 63], or CGRAs [16,17,26,34,35,57,60]) have become an increasingly favorable option for meeting the needs of data-processing workloads. Recently, multicore versions of these designs have seen commercial traction, particularly for use in datacenters (e.g.…”
Section: Introductionmentioning
confidence: 99%