Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the serious off-chip memory access overheads caused by transferring data between SPM and off-chip memory. In this paper, we propose a novel compiler-assisted technique, ISOS (Iteration-access-pattern-based Space Overlapping SPM management), for dynamic SPM management with DMA (Direct Memory Access). In ISOS, we combine both SPM and DMA for performance optimization by exploiting the chance to overlap SPM space so as to further utilize the limited SPM space and reduce the number of DMA operations. We implement our technique based on IMPACT and conduct experiments using a set of benchmarks from DSPstone and Mediabench on the cycle-accurate VLIW simulator of Trimaran. The experimental results show that our technique achieves run-time performance improvement compared with the previous work. The average improvements are 13.15, 19.05, and 25.52% when the SPM sizes are 1KB, 512 bytes, and 256 bytes, respectively. 738 Y. YANG ET AL. in embedded systems. However, it poses a huge challenge for the compiler to fully explore SPM since it is completely controlled by software.To effectively manage SPM, two kinds of compiler-managed methods have been proposed: static methods [6,8,[10][11][12][13][14][15][16][17] and dynamic methods [1,[18][19][20][21][22][23][24][25][26][27][28][29][30]. Basically, based on the static SPM management, the content in SPM is fixed and is not changed during the running time of applications. With the dynamic SPM management, the content of SPM is changed during the running time based on the behavior of applications. For dynamic SPM management, it is important to select an effective approach to transfer data between off-chip memory and SPM. This is because the latency of off-chip memory access is about 10-100 times of that of SPM [1,6,18,30], and many embedded applications in image and video processing domains have significant data transfer requirements in addition to their computational requirements [9,31,32]. To reduce off-chip memory access overheads, the dedicated cost-efficient hardware, DMA (Direct Memory Access) [33], is used to transfer data. The focus of this paper is on how to combine SPM and DMA in dynamic SPM management for optimizing loops that are usually the most critical sections in some embedded applications, such as DSP and image processing.Our work is closely related to the work in [20,29,[34][35][36][37]. In [20], Kandemir et al. proposed a dynamic SPM technique for loops that can determine memory layouts and best loop access patterns, partition the available SPM space, and restructure the code for explicit data transfer. In [29], DMA is applied for data transfer between SPM and off-chip memory by applying graph coloring for SPM management. In [34,35], a two-level loop tiling technique with partitioning and pre-fetching is proposed for optimizi...
Scratch-pad memory (SPM), a small, fast, software-managed on-chip SRAM (Static Random Access Memory) is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the serious off-chip memory access overheads caused by transferring data between SPM and off-chip memory. In this paper, we propose a novel compiler-assisted technique, ISOS (Iteration-access-pattern-based Space Overlapping SPM management), for dynamic SPM management with DMA (Direct Memory Access). In ISOS, we combine both SPM and DMA for performance optimization by exploiting the chance to overlap SPM space so as to further utilize the limited SPM space and reduce the number of DMA operations. We implement our technique based on IMPACT and conduct experiments using a set of benchmarks from DSPstone and Mediabench on the cycle-accurate VLIW simulator of Trimaran. The experimental results show that our technique achieves run-time performance improvement compared with the previous work. The average improvements are 13.15, 19.05, and 25.52% when the SPM sizes are 1KB, 512 bytes, and 256 bytes, respectively. 738 Y. YANG ET AL. in embedded systems. However, it poses a huge challenge for the compiler to fully explore SPM since it is completely controlled by software.To effectively manage SPM, two kinds of compiler-managed methods have been proposed: static methods [6,8,[10][11][12][13][14][15][16][17] and dynamic methods [1,[18][19][20][21][22][23][24][25][26][27][28][29][30]. Basically, based on the static SPM management, the content in SPM is fixed and is not changed during the running time of applications. With the dynamic SPM management, the content of SPM is changed during the running time based on the behavior of applications. For dynamic SPM management, it is important to select an effective approach to transfer data between off-chip memory and SPM. This is because the latency of off-chip memory access is about 10-100 times of that of SPM [1,6,18,30], and many embedded applications in image and video processing domains have significant data transfer requirements in addition to their computational requirements [9,31,32]. To reduce off-chip memory access overheads, the dedicated cost-efficient hardware, DMA (Direct Memory Access) [33], is used to transfer data. The focus of this paper is on how to combine SPM and DMA in dynamic SPM management for optimizing loops that are usually the most critical sections in some embedded applications, such as DSP and image processing.Our work is closely related to the work in [20,29,[34][35][36][37]. In [20], Kandemir et al. proposed a dynamic SPM technique for loops that can determine memory layouts and best loop access patterns, partition the available SPM space, and restructure the code for explicit data transfer. In [29], DMA is applied for data transfer between SPM and off-chip memory by applying graph coloring for SPM management. In [34,35], a two-level loop tiling technique with partitioning and pre-fetching is proposed for optimizi...
SUMMARYMemory accesses introduce big-time overhead and power consumption because of the performance gap between processors and main memory. This paper describes and evaluates a technique, loop scheduling with memory access reduction (LSMAR), that replaces hidden redundant load operations with register operations in loop kernels and performs partial scheduling for newly generated register operations subject to register constraints. By exploiting data dependence of memory access operations, the LSMAR technique can effectively reduce the number of memory accesses of loop kernels, thereby improving timing performance. The technique has been implemented into the Trimaran compiler and evaluated using a set of benchmarks from DSPstone and MiBench on the cycle-accurate simulator of the Trimaran infrastructure. The experimental results show that when the LSMAR technique is applied, the number of memory accesses can be reduced by 18.47% on average over the benchmarks when it is not applied. The measurements also indicate that the optimizations only lead to an average 1.41% increase in code size. With such small code size expansion, the technique is more suitable for embedded systems compared with prior work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.