2014
DOI: 10.1007/978-3-319-09967-5_8
|View full text |Cite
|
Sign up to set email alerts
|

Parametric GPU Code Generation for Affine Loop Programs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…Several works investigate the generation of loop code for a number of processors unknown at compile time. Dyn-Tile [10] and D-Tiling [12] target general-purpose multi-cores; Kong et al [14] generate vectorized code for cores supporting SIMD processing; Konstantinidis et al [15] generate parallelized code for GPUs. However, none of these approaches apply to TCPAs because the target architectures do neither rely on cycle-accurate synchronization of components nor require PE-specific compact programs (see Section 6.1) to save space and keep instruction memories small.…”
Section: Other Symbolic Loop Compilation Approachesmentioning
confidence: 99%
“…Several works investigate the generation of loop code for a number of processors unknown at compile time. Dyn-Tile [10] and D-Tiling [12] target general-purpose multi-cores; Kong et al [14] generate vectorized code for cores supporting SIMD processing; Konstantinidis et al [15] generate parallelized code for GPUs. However, none of these approaches apply to TCPAs because the target architectures do neither rely on cycle-accurate synchronization of components nor require PE-specific compact programs (see Section 6.1) to save space and keep instruction memories small.…”
Section: Other Symbolic Loop Compilation Approachesmentioning
confidence: 99%
“…Most studies focus on identifying data reuse (e.g., using a polyhedral model) [8]- [12] and exploit it by enabling local memory. Alternatively, in [13], the authors present a fully automated C-to-FPGA framework, including an end-to-end solution for on-chip buffer optimization that automatically detects and implements the available date reuse in a loop nest.…”
Section: Related Workmentioning
confidence: 99%