High Performance Parallelism Pearls 2015
DOI: 10.1016/b978-0-12-802118-7.00023-6
|View full text |Cite
|
Sign up to set email alerts
|

Characterization and Optimization Methodology Applied to Stencil Computations

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
26
0
7

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(34 citation statements)
references
References 2 publications
1
26
0
7
Order By: Relevance
“…asynchronous one-sided WS with distributed decen-work-commu-commu-PGAS MPI global load RTM memory tralized stealing nication nication RMA information Barros et al [24] Andreolli et al [26] Andreolli et al [27] Sena et al [28] x Hofmeyr et al [29] x Tchiboukdjian et al [30] x Imam and Sarkar [31] x x x Khaitan et al [32] x Tesser et al [33] x Tesser et al [34] x Tesser et al [35] x Padoin et al [36] x Padoin et al [37] x Sharma and Kanungo [38] x x x Zheng et al [39] x x x Martinez et al [40] x x x x Khaitan and Mccalley [41] x x x x Mor and Maillard [42] x x x x Li et al [43] x x x x x x Kumar et al [44] x x x x x x Dinan et al [21] x x x x x x Vishnu and Agarwal [49] x…”
Section: Discussionmentioning
confidence: 99%
“…asynchronous one-sided WS with distributed decen-work-commu-commu-PGAS MPI global load RTM memory tralized stealing nication nication RMA information Barros et al [24] Andreolli et al [26] Andreolli et al [27] Sena et al [28] x Hofmeyr et al [29] x Tchiboukdjian et al [30] x Imam and Sarkar [31] x x x Khaitan et al [32] x Tesser et al [33] x Tesser et al [34] x Tesser et al [35] x Padoin et al [36] x Padoin et al [37] x Sharma and Kanungo [38] x x x Zheng et al [39] x x x Martinez et al [40] x x x x Khaitan and Mccalley [41] x x x x Mor and Maillard [42] x x x x Li et al [43] x x x x x x Kumar et al [44] x x x x x x Dinan et al [21] x x x x x x Vishnu and Agarwal [49] x…”
Section: Discussionmentioning
confidence: 99%
“…It is important to note here that there is an instruction execution overhead that the above calculations did not take into account and therefore these theoretical peak numbers are not achievable ( 80% is achievable in practice [25]). For this reason, two benchmark algorithms, STREAM TRIAD for memory bandwidth [26,27] and LINPACK for floating point performance 185 [28], are often used to measure the practical limits of a particular hardware platform.…”
Section: Establishing the Rooflinementioning
confidence: 98%
“…Performance models such as the roofline model by [1] help establish statis- 25 tics for best case performance -to evaluate the performance of a code in terms of hardware utilization (e.g. percentage of peak floating point performance) instead of a relative speed-up.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In [6], Andreolli et al focused on acoustic wave propagation equations, choosing the optimization techniques from systematically tuning the algorithm. The usage of collaborative thread blocking, cache blocking, register re-use, vectorization and loop redistribution resulted in significant performance improvements.…”
Section: Related Workmentioning
confidence: 99%