2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 2016
DOI: 10.1109/ipdpsw.2016.181
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Chapel for Single-Node Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 8 publications
0
7
0
Order By: Relevance
“…Acquiring a lock is equivalent to reading the sync variable and releasing a lock is equivalent to writing to the sync variable. Similar approaches have been taken elsewhere to create arrays of OpenMP locks in Chapel [5]. This technique was functionally correct, but resulted in a significant loss of performance for our application, as discussed in Section V-D.…”
Section: A Mutex Poolmentioning
confidence: 88%
See 3 more Smart Citations
“…Acquiring a lock is equivalent to reading the sync variable and releasing a lock is equivalent to writing to the sync variable. Similar approaches have been taken elsewhere to create arrays of OpenMP locks in Chapel [5]. This technique was functionally correct, but resulted in a significant loss of performance for our application, as discussed in Section V-D.…”
Section: A Mutex Poolmentioning
confidence: 88%
“…There has been a significant effort to evaluate and analyze the performance of Chapel programs for both single-and multi-node environments. Johnson and Hollingsworth ported and optimized several C/OpenMP based benchmarks to singlenode Chapel including LULESH, MiniMD, and CLOMP [5]. Haque and Richards implemented an optimized multi-node version of CoMD in Chapel as well as identified key limitations of Chapel in regards to scope-based code locality [6].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…In addition, each of these tasks is going to execute on a remote locale using the on clause. Then, locale-specific variables are created, such as termination detection flags and state vector (lines [10][11][12]. A second coforall loop-based tasking construct is then used to exploit the intra-node parallel level, creating as many tasks as threads per locale (line 13).…”
Section: Parallel Distributed Dfs In Chapelmentioning
confidence: 99%