2015
DOI: 10.1145/2754930
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures

Abstract: There has been recent interest in exploring the acceleration of nonvectorizable workloads with spatially programmed architectures that are designed to efficiently exploit pipeline parallelism. Such an architecture faces two main problems: how to efficiently control each processing element (PE) in the system, and how to facilitate inter-PE communication without the overheads of traditional shared-memory coherent memory. In this article, we explore solving these problems using triggered instructions and latency-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 24 publications
0
9
0
Order By: Relevance
“…Spatial or dataflow computing distributes the control and eliminates the requirement and expectation of static reasoning: each operation executes as soon as all of its inputs arrive, and physical operators pass along control "tokens" and the data they produce [16]. Different authors exploited latency-insensitive protocols to construct dynamic, high-performance circuits.…”
Section: Related Workmentioning
confidence: 99%
“…Spatial or dataflow computing distributes the control and eliminates the requirement and expectation of static reasoning: each operation executes as soon as all of its inputs arrive, and physical operators pass along control "tokens" and the data they produce [16]. Different authors exploited latency-insensitive protocols to construct dynamic, high-performance circuits.…”
Section: Related Workmentioning
confidence: 99%
“…Spatial computation is a paradigm that breaking the application's dataflow into regions, and these regions are mapped to some subset of the hardware resources, including functional units, interconnection network and storage, in the form of producer-consumer pipeline [7]. Spatial computation has some similarity with the multicore models.…”
Section: Spatial Computationmentioning
confidence: 99%
“…In Fig. 2(a)(c), it shows the patterns of a program, totally parallelizable and loop-data-dependent, in which the circles with X y represents instruction X of iteration Y, and the squares with x y represents data read by instruction X Y [7]. Fig.…”
Section: Processing Element Array Processing Elementmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, the convolution kernel (that executes for the majority of execution time in ResNeXt [1]) is a 7-deep perfectly nested loop. Variations of dataflow accelerators, like systolic arrays (e.g., Tensor Processing Unit), coarse-grained reconfigurable arrays (CGRAs), and spatial architectures are repeatedly being demonstrated as a promising accelerator for these power and performancecritical loops [2][3][4][5][6][7][8][9][10]. As shown in Figure 1, dataflow accelerators, in general, comprise an array of processing elements aka PEs (where PEs are function-units with little local control) and noncoherent scratchpad memories (SPM) that allow concurrent execution and explicit data management.…”
Section: Introductionmentioning
confidence: 99%