Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 2013 2013
DOI: 10.7873/date.2013.127
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Remote Accesses for Offloaded Kernels: Application to High-Level Synthesis for FPGA

Abstract: Some data-and compute-intensive applications can be accelerated by offloading portions of codes to platforms such as GPGPUs or FPGAs. However, to get high performance for these kernels, it is mandatory to restructure the application, to generate adequate communication mechanisms for the transfer of remote data, and to make good usage of the memory bandwidth. In the context of the high-level synthesis (HLS), from a C program, of hardware accelerators on FPGA, we show how to automatically generate optimized remo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
27
0

Year Published

2013
2013
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 24 publications
(27 citation statements)
references
References 15 publications
0
27
0
Order By: Relevance
“…Previous work tries to establish analytic optimization formulations for the combined problem, such as optimizations of loop tiling parameters and reuse buffer selections are formulated into quadratic programming [9] and geometric programming [31] respectively. Alias et al uses tiling and prefetching to reduce the memory traffic [7], focusing on the Altera tool-chain. They proposed a formulation for the prefetching problem and the pipelining of communications, but their approach does not consider the balance between communication volume and scratchpad size/energy, nor any design-space exploration, contrary to the present work.…”
Section: Related Workmentioning
confidence: 99%
“…Previous work tries to establish analytic optimization formulations for the combined problem, such as optimizations of loop tiling parameters and reuse buffer selections are formulated into quadratic programming [9] and geometric programming [31] respectively. Alias et al uses tiling and prefetching to reduce the memory traffic [7], focusing on the Altera tool-chain. They proposed a formulation for the prefetching problem and the pipelining of communications, but their approach does not consider the balance between communication volume and scratchpad size/energy, nor any design-space exploration, contrary to the present work.…”
Section: Related Workmentioning
confidence: 99%
“…Exploiting data overlap of successive tiles is introduced only very recently [7], here it is used after optimization to remove redundant transfers. In section VIII we compare to this strategy (inter-tile reuse) and show that it is important to include inter-tile reuse into the tile size selection process.…”
Section: Related Workmentioning
confidence: 99%
“…(2) where T represents a tile, t < T represents the tiles scheduled for execution before the tile T , and t > T represents the tiles scheduled for execution after T . The denotation W(t > T ) corresponds to t>T W(t).…”
Section: Combining Load and Store Eliminationmentioning
confidence: 99%
“…This results in the code shown in Figure 1.3, where isolated variables have been put in uppercase. Statements (3) and (5) correspond to the exact regions on scalar variables. Statements (2) and (4) We show how convex array regions are used to generate calls to these operators.…”
Section: Introducing Convex Array Regionsmentioning
confidence: 99%
See 1 more Smart Citation