2018
DOI: 10.1145/3157669
|View full text |Cite
|
Sign up to set email alerts
|

Memory-Constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

Abstract: The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements of digital signal processing (DSP) applications. Moreover, in the presence of system-level memory constraints, hand optimization of code to satisfy these requirements is inefficient and error-prone, and can therefore, greatly slow down development time or result in highly u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1

Relationship

4
4

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 31 publications
0
7
0
Order By: Relevance
“…However, their presence must be taken into account by some forms of analysis and optimization. For example, self-loop edges in general limit the amount of data parallelism that can be exploited when scheduling a given actor (e.g., see Lin et al, 2018 ).…”
Section: Background and Related Workmentioning
confidence: 99%
“…However, their presence must be taken into account by some forms of analysis and optimization. For example, self-loop edges in general limit the amount of data parallelism that can be exploited when scheduling a given actor (e.g., see Lin et al, 2018 ).…”
Section: Background and Related Workmentioning
confidence: 99%
“…Many tools are able to analyze SDF graphs, to derive various properties (e.g. mapping and buffer size), and finally to generate the glue code of the schedule automatically: for example, DIF-GPU [14], PREESM [15], MAPS [16], Diplomat [17], Gaspard [18], PeaCE [19], and Ptolemy [20]. But these tools either do not jointly consider real-time execution and FPP scheduling, or do not perform all syntheses automatically.…”
Section: Related Workmentioning
confidence: 99%
“…In OpenCL terminology, the vectorization degree is commonly referred to as the number of global work items. Careful optimization of vectorization degrees can have major performance benefit for GPU acceleration of dataflow graphs [19].…”
Section: Throughput Optimizationmentioning
confidence: 99%