In this paper, we present a dynamic scratchpad memory allocation strategy targeting a horizontally partitioned memory subsystem for contemporary embedded processors. The memory subsystem is equipped with a memory management unit (MMU), and physically addressed scratchpad memory (SPM) is mapped into the virtual address space. A small minicache is added to further reduce energy consumption and improve performance. Using the MMU's page fault exception mechanism, we track page accesses and copy frequently executed code sections into the SPM before they are executed. Because the minimal transfer unit between the external memory and the SPM is a single memory page, good code placement is of great importance for the success of our method. Based on profiling information, our postpass optimizer divides the application binary into pageable, cacheable, and uncacheable regions. The latter two are placed at fixed locations in the external memory, and only pageable code is copied on demand to the SPM from the external memory. Pageable code is grouped into sections whose sizes are equal to the physical page size of the MMU. We discuss code grouping techniques and also analyze the effect of the minicache on execution time and energy consumption. We evaluate our SPM allocation strategy with twelve embedded applications, including MPEG-4. Compared to a fully-cached configuration, on average we achieve a 12% improvement in runtime performance and a 33% reduction in energy consumption by the memory system.
In this work, we present a dynamic memory allocation technique for a novel, horizontally partitioned memory subsystem targeting contemporary embedded processors with a memory management unit (MMU). We propose to replace the on-chip instruction cache with a scratchpad memory (SPM) and a small minicache. Serializing the address translation with the actual memory access enables the memory system to access either only the SPM or the minicache. Independent of the SPM size and based solely on profiling information, a postpass optimizer classifies the code of an application binary into a pageable and a cacheable code region. The latter is placed at a fixed location in the external memory and cached by the minicache. The former, the pageable code region, is copied on demand to the SPM before execution. Both the pageable code region and the SPM are logically divided into pages the size of an MMU memory page. Using the MMU's pagefault exception mechanism, a runtime scratchpad memory manager (SPMM) tracks page accesses and copies frequently executed code pages to the SPM before they get executed. In order to minimize the number of page transfers from the external memory to the SPM, good code placement techniques become more important with increasing sizes of the MMU pages. We discuss code-grouping techniques and provide an analysis of the effect of the MMU's page size on execution time, energy consumption, and external memory accesses. We show that by using the data cache as a victim buffer for the SPM, significant energy savings are possible. We evaluate our SPM allocation strategy with fifteen applications, including H.264, MP3, MPEG-4, and PGP. The proposed memory system requires 8% less die are compared to a fully-cached configuration. On average, we achieve a 31% improvement in runtime performance and a 35% reduction in energy consumption with an MMU page size of 256 bytes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.