Proceedings of the 8th ACM International Conference on Computing Frontiers 2011
DOI: 10.1145/2016604.2016641
|View full text |Cite
|
Sign up to set email alerts
|

Understanding stencil code performance on multicore architectures

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(30 citation statements)
references
References 27 publications
0
29
0
Order By: Relevance
“…[10] 363 combine the cache misses of only the functions that contribute significantly towards the total cache misses. The profiler TAU (Tuning and Analysis Utilities) [21] was used to obtain the PAPI (Performance Application Programming Interface) counters like PAPI_L1_DCM and PAPI_L2_DCM [14]. Table VII shows that the Z decomposition is the worst, with maximum predicted and actual cache misses.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…[10] 363 combine the cache misses of only the functions that contribute significantly towards the total cache misses. The profiler TAU (Tuning and Analysis Utilities) [21] was used to obtain the PAPI (Performance Application Programming Interface) counters like PAPI_L1_DCM and PAPI_L2_DCM [14]. Table VII shows that the Z decomposition is the worst, with maximum predicted and actual cache misses.…”
Section: Resultsmentioning
confidence: 99%
“…Performance optimization can start with domain decomposition at the macro-level. Figure 4 illustrates that traditional optimizations only consider reducing the cache misses [9] after performing domain decomposition [10], [11], [12], [13], [14]. We take a reverse approach in the sense that we derive a domain decomposition based on optimization of cache-misses.…”
Section: Or the Finite Element Methods (Fem)mentioning
confidence: 99%
See 1 more Smart Citation
“…To illustrate the benefits of CMS, we focus on stencil algorithms because of their broad applicability, the memory bandwidth sensitivity of their kernels [36,18,12,1], and their ubiquitous usage [55]. In particular, stencil algorithms constitute a large fraction of consumer, embedded, HPC and scientific applications in such diverse areas as image processing, seismic imaging [46], heat diffusion, electromagnetics, fluid dynamics, and climate modeling [51,52,78,56]. These applications often use iterative finite-difference techniques, which sweep over a spatial grid, performing nearest neighbor computations called stencils.…”
Section: Stencil Computationsmentioning
confidence: 99%
“…In a stencil operation, each point in a multi-dimensional grid is updated with weighted contributions from a subset of its neighbors in both time and space, thereby representing the coefficients of the partial differential equation (PDE) for that data element. Stencil sizes range from considering only its immediate neighbors to 9-, 13-, 21-and 27-point stencils [14,11,78,56]. Stencil calculations perform global sweeps through data structures that are typically much larger than the available data caches.…”
Section: Stencil Computationsmentioning
confidence: 99%