2018
DOI: 10.3997/2214-4609.201803072
|View full text |Cite
|
Sign up to set email alerts
|

Auto-Tuning of 3D Acoustic Wave Propagation in Shared Memory Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(10 citation statements)
references
References 0 publications
1
9
0
Order By: Relevance
“…4 shows that using the OpenMP dynamic scheduler without specifying the chunk size leads to a significantly higher number of cache misses when compared to the other schedulers. This result, as well as the experiments presented by [23], [29], [30], shows that using the OpenMP dynamic scheduler with the default unitary chunk size and other very small chunk sizes for the RTM leads to loss of performance. The reasons for that are the high number of cache misses due to false sharing, and the overhead to manage the distribution of tasks.…”
Section: ) Cache Misses Analysissupporting
confidence: 60%
See 2 more Smart Citations
“…4 shows that using the OpenMP dynamic scheduler without specifying the chunk size leads to a significantly higher number of cache misses when compared to the other schedulers. This result, as well as the experiments presented by [23], [29], [30], shows that using the OpenMP dynamic scheduler with the default unitary chunk size and other very small chunk sizes for the RTM leads to loss of performance. The reasons for that are the high number of cache misses due to false sharing, and the overhead to manage the distribution of tasks.…”
Section: ) Cache Misses Analysissupporting
confidence: 60%
“…For each CSA iteration, each optimizer only measures the execution time of the first time step in the forward propagation, using its current chunk size (Lines 6 and 13). As shown in [23], the run time of the first time step can accurately represent the total propagation execution time. This first-time step is performed twice (Line 4) and only the elapsed time of the second repetition is registered (Lines 5 and 12) in order to avoid cache population effects.…”
Section: Csa-based Auto-tuningmentioning
confidence: 99%
See 1 more Smart Citation
“…asynchronous one-sided WS with distributed decen-work-commu-commu-PGAS MPI global load RTM memory tralized stealing nication nication RMA information Barros et al [24] Andreolli et al [26] Andreolli et al [27] Sena et al [28] x Hofmeyr et al [29] x Tchiboukdjian et al [30] x Imam and Sarkar [31] x x x Khaitan et al [32] x Tesser et al [33] x Tesser et al [34] x Tesser et al [35] x Padoin et al [36] x Padoin et al [37] x Sharma and Kanungo [38] x x x Zheng et al [39] x x x Martinez et al [40] x x x x Khaitan and Mccalley [41] x x x x Mor and Maillard [42] x x x x Li et al [43] x x x x x x Kumar et al [44] x x x x x x Dinan et al [21] x x x x x x Vishnu and Agarwal [49] x…”
Section: Discussionmentioning
confidence: 99%
“…Several authors have proposed strategies to address the load imbalance for shared memory systems. Barros et al [24] introduced a runtime method based on coupled simulated annealing (CSA) [25] to auto-tune the workload ,478 11 15 8 88 12 50 51 518 13 30 24 248 14 62 25 258 15 32 55 558 16 63 57 578 17 17 59 598 18 20 21 387 19 2 1 19 20 23 4 49 21 22 5 59 22 42 8 89 23 28 29 178 24 34 9 478 25 35 12 -26 35 13 139 27 30 14 149 28 21 16 307 29 41 18 318 30 49 37 379 31 13 38 389 32 15 39 399 33 50 54 549 34 18 57 579 35 5 58 589 36 0 59 599 37 3 60 609 38 38 61 619 Table 2: Steal attempts varyi...…”
Section: Related Workmentioning
confidence: 99%