A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures

Benoît, Anne; Perarnau, Swann; Pottier, Loïc; Robert, Yves

doi:10.1145/3225058.3225110

Cited by 6 publications

(2 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this technique, the data is divided into chunks of a few GB, and the staged access is, in turn, applied to each of them. Several recent studies also focus on the data managements for hybrid memory systems [3,6,11,21,27,36,38], but none of them exploits this large performance impact of the access pattern to improve software-based data placement decisions at runtime.…”

Section: Introductionmentioning

confidence: 99%

“…Here, to simplify the explanation, we utilize the sequential code. But, our codes are actually parallelized with OpenMP in our evaluation 3. In our evaluation, we parallelize the code as follows: (1) to abort OpenMP parallel for loops on the way, we utilize cancel for statement; (2) to minimize the communications among threads, we set the statistics of the filters as private variables and collect them using atomic statement just after the end of the samplings.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Pattern-Aware Staging for Hybrid Memory Systems

Arima

Schulz

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The ever increasing demand for higher memory performance and-at the same time-larger memory capacity is leading the industry towards hybrid main memory designs, i.e., memory systems that consist of multiple different memory technologies. This trend, however, naturally leads to one important question: how can we efficiently utilize such hybrid memories? Our paper proposes a software-based approach to solve this challenge by deploying a pattern-aware staging technique. Our work is based on the following observations: (a) the high-bandwidth fast memory outperforms the large memory for memory intensive tasks; (b) but those tasks can run for much longer than a bulk data copy to/from the fast memory, especially when the access pattern is more irregular/sparse. We exploit these observations by applying the following staging technique if the accesses are irregular and sparse: (1) copying a chunk (few GB of sequential data) from large to fast memory; (2) performing a memory intensive task on the chunk; and (3) writing it back to the large memory. To check the regularity/sparseness of the accesses at runtime with negligible performance impact, we develop a lightweight pattern detection mechanism using a helper threading inspired approach with two different Bloom filters. Our case study using various scientific codes on a real system shows that our approach achieves significant speed-ups compared to executions with using only the large memory or hardware caching: 3× or 41% speedups in the best, respectively.

show abstract