Proceedings of the 47th International Conference on Parallel Processing 2018
DOI: 10.1145/3225058.3225110
|View full text |Cite
|
Sign up to set email alerts
|

A Performance Model to Execute Workflows on High-Bandwidth-Memory Architectures

Abstract: This work presents a realistic performance model to execute scientific workflows on high-bandwidth-memory architectures such as the Intel Knights Landing. We provide a detailed analysis of the execution time on such platforms, taking into account transfers from both fast and slow memory and their overlap with computations. We discuss several scheduling and mapping strategies: not only tasks must be assigned to computing resources, but also one has to decide which fraction of input and output data will reside i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 12 publications
0
2
0
Order By: Relevance
“…In this technique, the data is divided into chunks of a few GB, and the staged access is, in turn, applied to each of them. Several recent studies also focus on the data managements for hybrid memory systems [3,6,11,21,27,36,38], but none of them exploits this large performance impact of the access pattern to improve software-based data placement decisions at runtime.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this technique, the data is divided into chunks of a few GB, and the staged access is, in turn, applied to each of them. Several recent studies also focus on the data managements for hybrid memory systems [3,6,11,21,27,36,38], but none of them exploits this large performance impact of the access pattern to improve software-based data placement decisions at runtime.…”
Section: Introductionmentioning
confidence: 99%
“…Here, to simplify the explanation, we utilize the sequential code. But, our codes are actually parallelized with OpenMP in our evaluation 3. In our evaluation, we parallelize the code as follows: (1) to abort OpenMP parallel for loops on the way, we utilize cancel for statement; (2) to minimize the communications among threads, we set the statistics of the filters as private variables and collect them using atomic statement just after the end of the samplings.…”
mentioning
confidence: 99%