Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory

Wu, Panruo; Liu, Dong; Vetter, Jeffrey S.; Mittal, Sparsh

doi:10.1145/2907294.2907321

Cited by 31 publications

(23 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…HMS: Work on data placement in HMS have either focused on NVM+DRAM [17,26,34,[52][53][54] or DRAM+HBM systems [41,42,50]. Dulloor et al develop a placement policy where data structures are placed in DRAM based on access patterns on an emulated DRAM+NVM platform [17].…”

Section: Related Workmentioning

confidence: 99%

Intelligent Data Placement on Discrete GPU Nodes with Unified Memory

Sultana

Allen

Qasem

2020

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

With increasing heterogeneity, the importance of data organization within a compute node has grown immensely. Recently, industry vendors have introduced technology that can present a unified shared address space for multiple physical pools of memory. In this paper, we leverage unified memory technology and characterize the performance trade-offs of host and device placement across a range of hybrid application design patterns. We perform a Roofline analysis to establish fundamental performance bounds in collaborative applications and then develop an analytical model that makes profitable placement decisions at the individual data structure level. We integrate the placement model into a runtime system and enable transparent data placement in CUDA/C++ applications. Preliminary experiments yield the following results: (i) placement policies have significant performance impact across hybrid application design paradigms (ii) placement decisions are impacted by the sparsity of data access, page re-migration, amount of latency hiding opportunities and design-specific attributes such as the number of pipeline stages, and (iii) intelligent data placement can improve node performance by up to 5× on applications with sparse access patterns. CCS CONCEPTS • General and reference → Performance; • Software and its engineering → Runtime environments.

show abstract

Section: Related Workmentioning

confidence: 99%

Intelligent Data Placement on Discrete GPU Nodes with Unified Memory

Sultana

Allen

Qasem

2020

Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques

View full text Add to dashboard Cite

show abstract

“…We use LU decomposition as an example. LU decomposition factors a matrix A as the product of a lower triangular matrix L and an upper triangular matrix U [56]. The execution of LU decomposition program typically consists of a sequence of iterations.…”

Section: Ephemeral Data Structures Are Not Crash-consistent In Nvmm Smentioning

confidence: 99%

Compiler aided checkpointing using crash-consistent data structures in NVMM systems

Coy

Ren

et al. 2020

Proceedings of the 34th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

Scientific applications use checkpointing for failure recovery. The existing checkpointing approaches were proposed for storing persistent states of applications as checkpoints in disk-based file systems via the block interface. As non-volatile main memory (NVMM) will be included in high-performance computing systems, storing the checkpoints in NVMM-based file systems can significantly waste the performance benefits of NVMM. This is because it underutilizes memory resources and it does not take advantage of the byte-addressability of NVMM. In this paper, we propose an NVMM-aware checkpointing approach, named NV-Checkpoint. It uses a compiler-aided technique to automatically generate multi-version data structures, which consist of both the persistent version of data stored in NVMM for failure recovery and the ephemeral version of data placed across DRAM and NVMM. Because of the byte-addressability of NVMM, any versions can be accessed via the memory interface. The multiple versions may share data that are not mutated during the program's execution to reduce data redundancy. NV-Checkpoint provides the same level of guarantee of failure recovery compared to the conventional checkpointing approaches proposed for file systems. Furthermore, its runtime system manages the layout of the data structures to reduce the number of writes to NVMM. It also manages the checkpointing frequency to reduce persistence overhead using machine learning models. Our experimental results with real-world scientific applications show that the performance of annotated programs with NV-Checkpoint using a hybrid of DRAM and NVMM matches the performance of best-effort handwritten versions. It achieves similar scalability as those with ephemeral data structures using only DRAM. It offers up to 121X speedup of execution time

show abstract

“…Wang et al [24] rely on static analysis and advanced memory controller to monitor memory access pa erns to determine data placement on GPU. Wu et al [26] leverage the knowledge of numerical algorithms to direct data placement. ey introduce hardware modi cations to support massive data migration and performance optimization.…”

Section: Related Workmentioning

confidence: 99%

“…Our initial performance evaluation with HPC workloads (Section 2) shows that there is 1.09x-8.4x slowdown on NVM-based systems, depending on bandwidth and latency features of NVM. Because of the limitation of NVM, NVM is o en paired with a small fraction of DRAM to form a heterogeneous memory system (HMS) [8,10,13,19,24,26]. By selectively placing frequently accessed data in the small amount of DRAM available in HMS, we are able to exploit the cost and scaling bene ts of NVM while minimizing the limitation of NVM with DRAM.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Runtime Data Management on Non-Volatile Memory-based Heterogeneous Memory for Task-Parallel Programs

Ren

Liu

2018

SC18: International Conference for High Performance Computing, Networking, Storage and Analysis

Self Cite

View full text Add to dashboard Cite

Non-volatile memory (NVM) provides a scalable and power-e cient solution to replace DRAM as main memory. However, because of relatively high latency and low bandwidth of NVM, NVM is often paired with DRAM to build a heterogeneous memory system (HMS). As a result, data objects of the application must be carefully placed to NVM and DRAM for best performance. In this paper, we introduce a lightweight runtime solution that automatically and transparently manage data placement on HMS without the requirement of hardware modi cations and disruptive change to applications. Leveraging online pro ling and performance models, the runtime characterizes memory access pa erns associated with data objects, and minimizes unnecessary data movement. Our runtime solution e ectively bridges the performance gap between NVM and DRAM. We demonstrate that using NVM to replace the majority of DRAM can be a feasible solution for future HPC systems with the assistance of a so ware-based data management.

show abstract

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory

Cited by 31 publications

References 36 publications

Intelligent Data Placement on Discrete GPU Nodes with Unified Memory

Intelligent Data Placement on Discrete GPU Nodes with Unified Memory

Compiler aided checkpointing using crash-consistent data structures in NVMM systems

Runtime Data Management on Non-Volatile Memory-based Heterogeneous Memory for Task-Parallel Programs

Contact Info

Product

Resources

About