On the Role of NVRAM in Data-intensive Architectures: An Evaluation

Essen, Brian Van; Pearce, Roger; Ames, Sasha; Gokhale, Maya

doi:10.1109/ipdps.2012.69

Cited by 37 publications

(36 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, we, along with many others, have observed that the memory-map runtime in Linux is not suited for memory-mapped out-of-core applications [2] and cannot efficiently support this model. Even with highly optimized massively concurrent algorithms and high bandwidth low latency storage, applications designed to interact with very large working sets in main memory incur significant performance loss if they read and write data structures mapped to external storage as if they were in main memory.…”

Section: Introductionmentioning

confidence: 90%

“…It is a loadable Linux character device driver and it works outside of the standard Linux page caching system. It is derived from the PerMA simulator outlined in [2], sharing a common core codebase, and source code is available at [3]. It has been developed and tested for the 2.6.32 kernels in RHEL6.…”

Section: The Di-mmap Runtimementioning

confidence: 99%

“…In prior work [2] we demonstrated that the standard memory-map runtime in Linux will rapidly lose performance as both concurrency increases and as memory within the system becomes constrained. At the time we speculated that these performance bottlenecks were due to (a) the overhead of dynamic page management, and (b) a page buffering scheme and eviction algorithm ill-suited to many data-intensive applications.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications

Essen

Hsieh

Ames

et al. 2012

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Self Cite

View full text Add to dashboard Cite

Abstract-We present DI-MMAP, a high-performance runtime that memory-maps large external data sets into an application's address space and shows significantly better performance than the Linux mmap system call. Our implementation is particularly effective when used with high performance locally attached Flash arrays on highly concurrent, latencytolerant data-intensive HPC applications. We describe the kernel module and show performance results on a benchmark test suite and on a new bioinformatics metagenomic classification application. For the complex metagenomics classification application, DI-MMAP performs up to 4.88× better than standard Linux mmap.

show abstract

Section: Introductionmentioning

confidence: 90%

Section: The Di-mmap Runtimementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications

Essen

Hsieh

Ames

et al. 2012

2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Self Cite

View full text Add to dashboard Cite

show abstract

“…In 2015, Intel Corp. and Micron Technology unveiled 3D XPoint -non-volatile memory technology expected to be up to 1,000 times faster than NAND, 10 times denser than DRAM with latency of tens of nanoseconds and possible to be used as system memory [20], [21]. With declared relatively low price and expected market release in 2016 [22] [25] and -independently -Brian Van Essen et al [26], but papers are based on single node analysis and are focused rather on extending system memory (heap, stack, global data segments) than speeding up I/O operations in a distributed environment. Active NVRAM for I/O staging proposed by S. Kannan et al [27], [28] benefits from NVRAM located within each computing node speeding access to PFS up.…”

Section: Related Workmentioning

confidence: 99%

A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Malinowski

Czarnul

Dorozynski³

et al. 2016

Annals of Computer Science and Information Systems

View full text Add to dashboard Cite

Abstract-While many scientific, large-scale applications are data-intensive, fast and efficient I/O operations have become of key importance for HPC environments. We propose an MPI I/O extension based on in-system distributed cache with data located in Non-volatile Random Access Memory (NVRAM) available in each cluster node. The presented architecture makes effective use of NVRAM properties such as persistence and byte-level access behind the MPI I/O API. Another advantage of the proposed solution is making development of a parallel application easy and efficient as a programmer just needs to use the well known MPI I/O data model and API while efficient file access is automatically provided without a need for application level optimizations like avoiding frequent operations on a small data.Results of experiments obtained with three different applications suggest, that the extension significantly reduces file access time, especially for small I/O operations. By locating cache facilities on computing nodes, the extension decreases load of file system servers and makes I/O scalable.

show abstract

“…These environments are required to accommodate the data storage requirements of large graphs. NVRAM has significantly slower access speeds than main-memory (DRAM), however our previous work demonstrated that it can be a powerful storage media for graph applications [4], [5].…”

Section: Introductionmentioning

confidence: 99%

Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory

Pearce

Gokhale

Amato

2013

2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Self Cite

View full text Add to dashboard Cite

Abstract-We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash.We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots.We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results [1]. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS).

show abstract

On the Role of NVRAM in Data-intensive Architectures: An Evaluation

Cited by 37 publications

References 8 publications

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications

DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications

A Parallel MPI I/O Solution Supported by Byte-addressable Non-volatile RAM Distributed Cache

Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory

Contact Info

Product

Resources

About