“…Specifically, two memory phases are considered: an acquisition (or load) phase that copies data and instructions from main memory into local memory, and a replication (or unload) phase that copies modified data back to main memory. While the computation phase is always executed on a processor, the memory phases can be either executed on the processor itself [5,6,13,22,26,49,50,53,56,71,72], or on another hardware component [30,31], such as a programmable Direct Memory Access (DMA) module [7,20,61,66]. Works that proposed using a DMA unit to perform the memory transfers [66] can efficiently hide the memory latency by overlapping the execution of a task with the DMA transfer of another task; this leads to considerable improvements in schedulability.…”