Compiler reuse analysis for the mapping of data in FPGAs with RAM blocks

Baradaran, Nastaran; Park, Joonseok; Diniz, Pedro C.

doi:10.1109/fpt.2004.1393262

Cited by 16 publications

(17 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However, there are some differences between ASIC and FPGA architectures, notably the large quantity of distributed registers and the discrete sizes of on-chip RAM available on an FPGA platform. The use of on-chip embedded RAM and registers to facilitate data reuse in FPGAs has been reported in [17] and [18]. They use the following simple scheme.…”

Section: Introductionmentioning

confidence: 99%

Data-reuse exploration under an on-chip memory constraint for low-power FPGA-based systems

Liu

Constantinides

Masselos

et al. 2009

IET Comput. Digit. Tech.

View full text Add to dashboard Cite

Contemporary FPGA-based reconfigurable systems have been widely used to implement data dominated applications. In these applications data transfer and storage consume a large proportion of the system energy. Exploiting data reuse can introduce significant power savings, but also introduces the extra requirement for on-chip memory. To aid data reuse design exploration early during the design cycle, we present an optimization approach to achieve a power-optimal design satisfying an on-chip memory constraint in a targeted FPGA-based platform. The data reuse exploration problem is mathematically formulated and shown to be equivalent to the Multiple-Choice Knapsack Problem (MCKP). The solution to this problem for an application code corresponds to the decision of which array references are to be buffered on-chip and where loading reused data of the array references into on-chip memory happens in the code, in order to minimize power consumption for a fixed on-chip memory size. We also present an experimentally verified power model, capable of providing the relative power information between different data reuse design options of an application, resulting in a fast and efficient design space exploration. The experimental results demonstrate that the approach enables us to find the most power efficient design for all the benchmark circuits tested.

show abstract

Section: Introductionmentioning

confidence: 99%

Data-reuse exploration under an on-chip memory constraint for low-power FPGA-based systems

Liu

Constantinides

Masselos

et al. 2009

IET Comput. Digit. Tech.

View full text Add to dashboard Cite

show abstract

“…Approaches in [14] and [15] determine which data should be transferred into SPM and when and where in a code these transfers happen to improve the performance of the code, based on memory access cost models. Research into buffering reused data in FPGA on-chip RAMs and registers has been carried out in [5], [7], [8], and [16]. In [16], applications speed up through pipelining with high data throughput, which is obtained by storing reused data in shift registers and shift on-chip RAMs.…”

Section: Introductionmentioning

confidence: 99%

“…In [16], applications speed up through pipelining with high data throughput, which is obtained by storing reused data in shift registers and shift on-chip RAMs. In [7] and [8], arrays more beneficial to minimize the memory access time are stored in either registers or on-chip RAMs if register is not available. The work in [5] formulates the problem of data-reuse exploration aimed at low power as the multichoice knapsack problem.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework

Liu

Constantinides

Masselos

et al. 2009

IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.

View full text Add to dashboard Cite

Abstract-A nonlinear optimization framework is proposed in this paper to automate exploration of the design space consisting of data-reuse (buffering) decisions and loop-level parallelization, in the context of field-programmable-gate-array-targeted hardware compilation. Buffering frequently accessed data in on-chip memories can reduce off-chip memory accesses and open avenues for parallelization. However, the exploitation of both data reuse and parallelization is limited by the memory resources available on-chip. As a result, considering these two problems separately, e.g., first exploring data reuse and then exploring data-level parallelization, based on the data-reuse options determined in the first step, may not yield the performance-optimal designs for limited on-chip memory resources. We consider both problems at the same time, exposing the dependence between the two. We show that this combined problem can be formulated as a nonlinear program and further show that efficient solution techniques exist for this problem, based on recent advances in optimization of so-called geometric programming problems. The results from applying this framework to several real benchmarks implemented on a Xilinx device demonstrate that given different constraints on on-chip memory utilization, the corresponding performanceoptimal designs are automatically determined by the framework. We have also implemented designs determined by a two-stage optimization method that first explores data reuse and then explores parallelization on the same platform, and by comparison, the performance-optimal designs proposed by our framework are faster than the designs determined by the two-stage method by up to 5.7 times.

show abstract

“…Weinhardt and Luk [12] describe a limited compiler approach for using RAM blocks to cache the data in contemporary FPGAs. In our own work we have used the same data reuse analysis framework outlined in this paper to explore the area and space trade-offs of using RAM blocks to store scalar replaced variables [2], whereas So and Hall [11] exclusively use registers to cache the data. There has also been extensive work in hierarchical data mapping in order to improve overall performance metrics such as time or power [1,7,10].…”

Section: Storage Resource Allocationmentioning

confidence: 99%

A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine-Grain Configurable Architectures

Baradaran

Diniz

Design, Automation and Test in Europe

Self Cite

View full text Add to dashboard Cite

show abstract

Compiler reuse analysis for the mapping of data in FPGAs with RAM blocks

Cited by 16 publications

References 10 publications

Data-reuse exploration under an on-chip memory constraint for low-power FPGA-based systems

Data-reuse exploration under an on-chip memory constraint for low-power FPGA-based systems

Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework

A Register Allocation Algorithm in the Presence of Scalar Replacement for Fine-Grain Configurable Architectures

Contact Info

Product

Resources

About