Heterogeneous computing relies on collaboration among different types of processors on shared data. In systems with discrete accelerators (e.g., GP-GPU), data sharing requires transferring a large amount of data between CPU and accelerator memories and can significantly increase the end-toend execution time. This paper proposes a novel mechanism called Demand MemCpy (DMC) to hide the data sharing overheads. DMC copies data from host memory to accelerator memory based on demands at page granularity. It utilizes a hardware-only mechanism to fetch the requested page with a short latency and the background pre-copy to fetch related pages in advance. Our evaluation shows that DMC can reduce the end-to-end execution time of GP-GPU application by 25.4% on average by overlapping computation with data transfer and not transferring unused pages.
Data-intensive applications and throughput-oriented processors demand more memory bandwidth. Memory compression can provide more data beyond physical limits, yet new data types and smaller block sizes are challenging. This paper presents a novel and lightweight memory compression framework, Multi-Prediction Compression (MPC), to increase the effective memory bandwidth. Based on multiple prediction models and data-driven algorithm tuning, MPC can provide 31.7% better compression than state-of-the-art (SOTA) algorithms for 32B blocks. Moreover, MPC is hardware-friendly and scalable to support a growing number of data patterns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.