2016 45th International Conference on Parallel Processing Workshops (ICPPW) 2016
DOI: 10.1109/icppw.2016.39
|View full text |Cite
|
Sign up to set email alerts
|

Using HPX and OP2 for Improving Parallel Scaling Performance of Unstructured Grid Applications

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2017
2017
2017
2017

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…In this research, we study an Airfoil application, which is a standard unstructured mesh finite volume computational fluid dynamics (CFD) code, presented in [15], for the turbomachinery simulation and consists of over 720K nodes and about 1.5 million edges. As described in [15] and [16], it consists of five parallel loops:…”
Section: B Airfoil Applicationmentioning
confidence: 99%
See 1 more Smart Citation
“…In this research, we study an Airfoil application, which is a standard unstructured mesh finite volume computational fluid dynamics (CFD) code, presented in [15], for the turbomachinery simulation and consists of over 720K nodes and about 1.5 million edges. As described in [15] and [16], it consists of five parallel loops:…”
Section: B Airfoil Applicationmentioning
confidence: 99%
“…In this research different dynamic optimizations are proposed for improving the performance of code generated by the OP2 compiler that are implemented using HPX runtime system, which has been developed to overcome limitations such as global barriers and poor latency hiding [9], [10] by embracing new ways of coordinating parallel execution, controlling synchronization, and implementing latency hiding utilizing Local Control Objects (LCO) [16], [18]. These objects have the ability to create, resume, or suspend a thread when triggered by one or more events.…”
Section: Hpxmentioning
confidence: 99%
“…Chunk size is the amount of work performed by each task [12,13] that is determined by an auto_partitioner exposed by the HPX algorithms or is passed by using static/dynamic_chunk_size as an execution policy's parameter [10]. However, (1) the experimental results in [4] and [3] showed that the overheads of determining chunk size by using the auto_partitioner negatively effected the application's scalability in some cases; (2) the policy written by the user will often not be able to determine the optimum chunk size either due to the limit of runtime information. • In [14], we proposed the HPX prefetching method which aids prefetching that not only reduces the memory accesses latency, but also relaxes the global barrier.…”
Section: Introductionmentioning
confidence: 99%
“…While runtime adaptive methods have been shown to be very effective -especially for highly dynamic scenarios -solely relying on them doesn't guarantee maximal parallel performance, since the performance of an application depends on both the values measured at runtime and the related transformations performed at compile time. Collecting the outcome of the static analysis performed by the compiler could significantly improve runtime decisions and therefore application performance [1][2][3][4].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation