Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them.This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way.We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5% on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.
Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have been done to address part of these problems, such as photonics technology for bandwidth, 3D stacking for capacity, and NVM for power as well as many micro-architecture level innovations. Many of them need a modification of current memory architecture, since the decades-old synchronous memory architecture (SDRAM) has become an obstacle to adopt those advances. However, to the best of our knowledge, none of them is able to provide a universal memory interface that is scalable enough to cover all these problems.In this paper, we argue that a message-based interface should be adopted to replace the traditional bus-based interface in memory system. A novel message interface based memory system (MIMS) is proposed. The key innovation of MIMS is that processor and memory system communicate through a universal and flexible message interface. Each message packet could contain multiple memory requests or commands along with various semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible to process packet, schedule memory requests, and execute specific commands with the help of semantic information. The experimental results by simulator show that, with accurate granularity message, the MIMS would improve performance by 53.21%, while reducing energy delay product (EDP) by 55.90%, the effective bandwidth utilization is improving by 62.42%. Further more, combining multiple requests in a packet would reduce link overhead and provide opportunity for address compression.However main memory that acts as the bridge between high level data and low level processor is failed to scale, leading memory system to be a main bottleneck. Besides the wellknown memory wall problem [53], the memory system also faces many other challenges (walls), which are concluded as followed:Memory wall (Latency): The original "memory wall" referred to memory access latency problem [53] and it was the main problem in memory system until mid-2000s when the CPU frequency race slowed down. Then came the multi/many core age. The situation has changed a bit that queuing delays have become a major bottleneck, and might contribute more than 70% of memory latency [50]. Thus for future memory architecture, it should place a higher priority to reduce queuing delays. Exploiting higher parallelism in memory system could reduce queuing delays because it is able to de-queue requests faster [50].Bandwidth wall: The increasing number of concurrent memory requests along with the increasing amount of data, result in heavy bandwidth pressure. However the bandwidth of memory is failing to scale due to the relatively slow growth of pin counts of processor module (about 10% per year [2]). This has been concluded as bandwidth wall [46]. The average memory bandwidth for...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.