“…Many studies report that in most embedded system applications a large portion of energy consumption and execution time is due to memory access operations [10,11]. At the end of Section 3, we stated that the time would increase as O(N 3 ) for N variables in the input code.…”
Section: Compile Time Costmentioning
confidence: 96%
“…Edge (a, d) is selected because its N ls/st is the largest. In consequence, three new MLS instructions, I (1,4) , I (9,11) , and I (16,15) , which respectively come from the results of merging instructions I 1 and I 4 ,…”
Section: Generating Mls Instructionsmentioning
confidence: 99%
“…I (16,15) N l/s I(2,4) I (10,11) I (3,4) I (13,14) I(6,7) I (12,14) I (17,18) I (12,13) I (2,3) I(1,4) I (9,11) I (1,3) I(9,10) I (12,13,14) I I (1,2,4) (9,10,11)…”
In a recent study, we discovered that many single load/store operations in embedded applications can be parallelized and thus encoded simultaneously in a single-instruction multiple-data instruction, called the multiple load/store (MLS) instruction. In this work, we investigate the problem of utilizing MLS instructions to produce optimized machine code, and propose an effective approach to the problem. Specifically, we formalize the MLS problem, that is, the problem of maximizing the use of MLS instructions with an unlimited register file size. Based on this analysis, we show that we can solve the problem efficiently by translating it into a variant of the problem finding a maximum weighted path cover in a dynamic weighted graph. To handle a more realistic case of the finite size of the register file, our solution is then extended to take into account the constraints of register sequencing in MLS instructions and the limited register resource available in the target processor. We demonstrate the effectiveness of our approach experimentally by using a set of benchmark programs. In summary, our approach can reduce the number of loads/stores by 13.3% on average, compared with the code generated from existing compilers. The total code size reduction is 3.6%. This code size reduction comes at almost no cost because the overall increase in compilation time as a result of our technique remains quite minimal.
“…Many studies report that in most embedded system applications a large portion of energy consumption and execution time is due to memory access operations [10,11]. At the end of Section 3, we stated that the time would increase as O(N 3 ) for N variables in the input code.…”
Section: Compile Time Costmentioning
confidence: 96%
“…Edge (a, d) is selected because its N ls/st is the largest. In consequence, three new MLS instructions, I (1,4) , I (9,11) , and I (16,15) , which respectively come from the results of merging instructions I 1 and I 4 ,…”
Section: Generating Mls Instructionsmentioning
confidence: 99%
“…I (16,15) N l/s I(2,4) I (10,11) I (3,4) I (13,14) I(6,7) I (12,14) I (17,18) I (12,13) I (2,3) I(1,4) I (9,11) I (1,3) I(9,10) I (12,13,14) I I (1,2,4) (9,10,11)…”
In a recent study, we discovered that many single load/store operations in embedded applications can be parallelized and thus encoded simultaneously in a single-instruction multiple-data instruction, called the multiple load/store (MLS) instruction. In this work, we investigate the problem of utilizing MLS instructions to produce optimized machine code, and propose an effective approach to the problem. Specifically, we formalize the MLS problem, that is, the problem of maximizing the use of MLS instructions with an unlimited register file size. Based on this analysis, we show that we can solve the problem efficiently by translating it into a variant of the problem finding a maximum weighted path cover in a dynamic weighted graph. To handle a more realistic case of the finite size of the register file, our solution is then extended to take into account the constraints of register sequencing in MLS instructions and the limited register resource available in the target processor. We demonstrate the effectiveness of our approach experimentally by using a set of benchmark programs. In summary, our approach can reduce the number of loads/stores by 13.3% on average, compared with the code generated from existing compilers. The total code size reduction is 3.6%. This code size reduction comes at almost no cost because the overall increase in compilation time as a result of our technique remains quite minimal.
“…Yet the emphasis of this paper is the memory management and the data transfers. In this topic, a few research works about reconfigurable computing do introduce techniques for managing the memory and reducing the number of data transfers [22,3,29]. Still, the most active researches are about scratch-pad memory [18,16,27] or GPUs [2,6].…”
Section: Related Workmentioning
confidence: 99%
“…Still, the most active researches are about scratch-pad memory [18,16,27] or GPUs [2,6]. These approaches usually support single or shared memory organizations, and have various contributions like compile-time or operating-system-based allocation and copy policies [18,27,29], new memory allocators [16], or schedule-based optimizations for reducing the cost of data transfers [22,2,6]. They are complementary to our approach since it supports run-time decisions and targets a distributed memory organization where the datapath tiles can only access their local memories.…”
This paper presents a hybrid compile and run-time memory management technique for a 3D-stacked reconfigurable accelerator including a memory layer composed of multiple memory units whose parallel access allows a very high bandwidth. The technique inserts allocation, free and data transfers into the code for using the memory layer and avoids memory overflows by adding a limited number of additional copies to and from the host memory. When compile-time information is lacking, the technique relies on run-time decisions for controlling these memory operations. Experiments show that, compared to a pessimistic approach, the overhead for avoiding overflows can be cut on average by 27%, 45% and 63% when the size of each memory unit is respectively 1kB, 128kB and 1MB.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.