Zehan Cui scite author profile

Main memory system is a shared resource in modern multicore machines, resulting in serious interference, which causes performance degradation in terms of throughput slowdown and unfairness. Numerous new memory scheduling algorithms have been proposed to address the interference problem. However, these algorithms usually employ complex scheduling logic and need hardware modification to memory controllers, as a result, industrial venders seem to have some hesitation in adopting them.This paper presents a practical software approach to effectively eliminate the interference without hardware modification. The key idea is to modify the OS memory management subsystem to adopt a page-coloring based bank-level partition mechanism (BPM), which allocates specific DRAM banks to specific cores (threads). By using BPM, memory controllers can passively schedule memory requests in a core-cluster (or thread-cluster) way.We implement BPM in Linux 2.6.32.15 kernel and evaluate BPM on 4-core and 8-core real machines by running randomly generated 20 multi-programmed workloads (each contains 4/8 benchmarks) and multi-threaded benchmark. Experimental results show that BPM can improve the overall system throughput by 4.7% on average (up to 8.6%), and reduce the maximum slowdown by 4.5% on average (up to 15.8%). Moreover, BPM also saves 5.2% of the energy consumption of memory system.

show abstract

Going vertical in memory management: Handling multiplicity by multi-policy

Liu

Cui

et al. 2014

View full text Add to dashboard Cite

show abstract

MIMS: Towards a Message Interface Based Memory System

Chen¹,

Chen²,

Yuan³

et al. 2014

J. Comput. Sci. Technol.

View full text Add to dashboard Cite

Memory system is often the main bottleneck in chipmultiprocessor (CMP) systems in terms of latency, bandwidth and efficiency, and recently additionally facing capacity and power problems in an era of big data. A lot of research works have been done to address part of these problems, such as photonics technology for bandwidth, 3D stacking for capacity, and NVM for power as well as many micro-architecture level innovations. Many of them need a modification of current memory architecture, since the decades-old synchronous memory architecture (SDRAM) has become an obstacle to adopt those advances. However, to the best of our knowledge, none of them is able to provide a universal memory interface that is scalable enough to cover all these problems.In this paper, we argue that a message-based interface should be adopted to replace the traditional bus-based interface in memory system. A novel message interface based memory system (MIMS) is proposed. The key innovation of MIMS is that processor and memory system communicate through a universal and flexible message interface. Each message packet could contain multiple memory requests or commands along with various semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible to process packet, schedule memory requests, and execute specific commands with the help of semantic information. The experimental results by simulator show that, with accurate granularity message, the MIMS would improve performance by 53.21%, while reducing energy delay product (EDP) by 55.90%, the effective bandwidth utilization is improving by 62.42%. Further more, combining multiple requests in a packet would reduce link overhead and provide opportunity for address compression.However main memory that acts as the bridge between high level data and low level processor is failed to scale, leading memory system to be a main bottleneck. Besides the wellknown memory wall problem [53], the memory system also faces many other challenges (walls), which are concluded as followed:Memory wall (Latency): The original "memory wall" referred to memory access latency problem [53] and it was the main problem in memory system until mid-2000s when the CPU frequency race slowed down. Then came the multi/many core age. The situation has changed a bit that queuing delays have become a major bottleneck, and might contribute more than 70% of memory latency [50]. Thus for future memory architecture, it should place a higher priority to reduce queuing delays. Exploiting higher parallelism in memory system could reduce queuing delays because it is able to de-queue requests faster [50].Bandwidth wall: The increasing number of concurrent memory requests along with the increasing amount of data, result in heavy bandwidth pressure. However the bandwidth of memory is failing to scale due to the relatively slow growth of pin counts of processor module (about 10% per year [2]). This has been concluded as bandwidth wall [46]. The average memory bandwidth for...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zehan Cui

A software memory partition approach for eliminating bank-level interference in multicore systems

Going vertical in memory management: Handling multiplicity by multi-policy

MIMS: Towards a Message Interface Based Memory System

Contact Info

Product

Resources

About