Low energy data management for different on-chip memory levels in multi-context reconfigurable architectures

Sánchez-Élez, M.; Fernández, M.; Anido, M.; Du, H.; Bagherzadeh, Nader; Hermida, R.

doi:10.1109/date.2003.1253584

Cited by 5 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many studies report that in most embedded system applications a large portion of energy consumption and execution time is due to memory access operations [10,11]. At the end of Section 3, we stated that the time would increase as O(N 3 ) for N variables in the input code.…”

Section: Compile Time Costmentioning

confidence: 96%

“…Edge (a, d) is selected because its N ls/st is the largest. In consequence, three new MLS instructions, I (1,4) , I (9,11) , and I (16,15) , which respectively come from the results of merging instructions I 1 and I 4 ,…”

Section: Generating Mls Instructionsmentioning

confidence: 99%

“…I (16,15) N l/s I(2,4) I (10,11) I (3,4) I (13,14) I(6,7) I (12,14) I (17,18) I (12,13) I (2,3) I(1,4) I (9,11) I (1,3) I(9,10) I (12,13,14) I I (1,2,4) (9,10,11)…”

Section: Generating Mls Instructionsmentioning

confidence: 99%

See 2 more Smart Citations

Efficient embedded code generation with multiple load/store instructions

et al. 2007

View full text Add to dashboard Cite

In a recent study, we discovered that many single load/store operations in embedded applications can be parallelized and thus encoded simultaneously in a single-instruction multiple-data instruction, called the multiple load/store (MLS) instruction. In this work, we investigate the problem of utilizing MLS instructions to produce optimized machine code, and propose an effective approach to the problem. Specifically, we formalize the MLS problem, that is, the problem of maximizing the use of MLS instructions with an unlimited register file size. Based on this analysis, we show that we can solve the problem efficiently by translating it into a variant of the problem finding a maximum weighted path cover in a dynamic weighted graph. To handle a more realistic case of the finite size of the register file, our solution is then extended to take into account the constraints of register sequencing in MLS instructions and the limited register resource available in the target processor. We demonstrate the effectiveness of our approach experimentally by using a set of benchmark programs. In summary, our approach can reduce the number of loads/stores by 13.3% on average, compared with the code generated from existing compilers. The total code size reduction is 3.6%. This code size reduction comes at almost no cost because the overall increase in compilation time as a result of our technique remains quite minimal.

show abstract

Section: Compile Time Costmentioning

confidence: 96%

Section: Generating Mls Instructionsmentioning

confidence: 99%

See 1 more Smart Citation

Efficient embedded code generation with multiple load/store instructions

et al. 2007

View full text Add to dashboard Cite

show abstract

“…Yet the emphasis of this paper is the memory management and the data transfers. In this topic, a few research works about reconfigurable computing do introduce techniques for managing the memory and reducing the number of data transfers [22,3,29]. Still, the most active researches are about scratch-pad memory [18,16,27] or GPUs [2,6].…”

Section: Related Workmentioning

confidence: 99%

“…Still, the most active researches are about scratch-pad memory [18,16,27] or GPUs [2,6]. These approaches usually support single or shared memory organizations, and have various contributions like compile-time or operating-system-based allocation and copy policies [18,27,29], new memory allocators [16], or schedule-based optimizations for reducing the cost of data transfers [22,2,6]. They are complementary to our approach since it supports run-time decisions and targets a distributed memory organization where the datapath tiles can only access their local memories.…”

Section: Related Workmentioning

confidence: 99%

Hybrid compile and run-time memory management for a 3D-stacked reconfigurable accelerator

Gauthier

Ueno

Inoue

2013

2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)

View full text Add to dashboard Cite

This paper presents a hybrid compile and run-time memory management technique for a 3D-stacked reconfigurable accelerator including a memory layer composed of multiple memory units whose parallel access allows a very high bandwidth. The technique inserts allocation, free and data transfers into the code for using the memory layer and avoids memory overflows by adding a limited number of additional copies to and from the host memory. When compile-time information is lacking, the technique relies on run-time decisions for controlling these memory operations. Experiments show that, compared to a pessimistic approach, the overhead for avoiding overflows can be cut on average by 27%, 45% and 63% when the size of each memory unit is respectively 1kB, 128kB and 1MB.

show abstract