Enabling run-time memory data transfer optimizations at the system level with automated extraction of embedded software metadata information

Bartzas,; Peon-Quiros,; Mamagkakis,; Catthoor,; Soudris,; Mendias,

doi:10.1109/aspdac.2008.4483990

Cited by 7 publications

(5 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast to previous work, [5] instruments just the dynamic memory management primitives along with all the accesses to memory. A data identification algorithm is proposed and based on the results, DMA optimizations are implemented.…”

Section: Background and Related Workmentioning

confidence: 87%

Runtime Memory Allocation in a Heterogeneous Reconfigurable Platform

Sima

Bertels

2009

2009 International Conference on Reconfigurable Computing and FPGAs

View full text Add to dashboard Cite

In this paper, we present a runtime memory allocation algorithm, that aims to substantially reduce the overhead caused by shared-memory accesses by allocating memory directly in the local scratch pad memories. We target a heterogeneous platform, with a complex memory hierarchy. Using special instrumentation, we determine what memory areas are used in functions that could run on different processing elements, like, for example a reconfigurable logic array. Based on profile information, the programmer annotates some functions as candidates for accelerated execution. Then, an algorithm decides the best allocation, taking into account the various processing elements and special scratch pad memories of the the heterogeneous platform. Tests are performed on our prototype platform, a Virtex ML410 with Linux operating system, containing a PowerPC processor and a Xilinx FPGA, implementing the MOLEN programming paradigm. We test the algorithm using both state of the art H.264 video encoder as well as other synthetic applications. The performance improvement for the H.264 application is 14% compared to the software only version while the overhead is less than 1% of the application execution time. This improvement is the optimal improvement that can be obtained by optimizing the memory allocation. For the synthetic applications the results are within 5% of the optimum. 1

show abstract

Section: Background and Related Workmentioning

confidence: 87%

Runtime Memory Allocation in a Heterogeneous Reconfigurable Platform

Sima

Bertels

2009

2009 International Conference on Reconfigurable Computing and FPGAs

View full text Add to dashboard Cite

show abstract

“…Therefore, the information regarding the behavior of the application in terms of its dynamic memory consumption must be collected at run-time. As such, the method starts with a metadata collection step before the different optimization steps are applied, resulting in a method flow to tackle dynamic memory [22].…”

Section: Proposed Optimization Methodsmentioning

confidence: 99%

“…The second one uses the DMA module for blocks of more than 32 bytes (eight words), but the maximum number of cycles that the DMA engine may hold the bus during burst transactions is limited to eight words (therefore, once the DMA is granted access to the bus, it can transfer without interruptions at least as many bytes as the shortest transfer). Finally, the third configuration employs the DMA module for transfers of at least 32 bytes, but ensures that the DMA may access up to a full DRAM page in a single burst transaction to maximize the efficiency; additionally, this last policy uses the techniques presented in [135] and [22] to decide whether to use the DMA module or not if the system can recognize the current input case. We refer to these policies as "No DMA," "DMA Bad" and "DMA Opt", respectively.…”

Section: Dynamic Memory Block Transfer Optimizationmentioning

confidence: 99%

“…This ensures that no comparisons between individual DDTs, but between the DDTs and the groups, are done, reducing the algorithm complexity. To evaluate the inclusion of a DDT in a group, the new combined behavior is calculated (lines [22][23]. This is a straightforward process that involves combining two ordered lists and accounting for the accumulated footprint and accesses of the group and the DDT.…”

Section: Liveness and Exploitation Ratiomentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Memory Management for Embedded Systems

Alonso¹,

Mamagkakis

Poucet³

et al. 2015

Self Cite

View full text Add to dashboard Cite

“…More information about this application and the network traces can be found in [34]. The case study was evaluated on a cycle-accurate Network-on-Chip (NoC) simulation environment [18].…”

Section: Application Overviewmentioning

confidence: 99%