Proceedings of the 2005 Workshop on Memory System Performance 2005
DOI: 10.1145/1111583.1111589
|View full text |Cite
|
Sign up to set email alerts
|

Impact of modern memory subsystems on cache optimizations for stencil computations

Abstract: In this work we investigate the impact of evolving memory system features, such as large on-chip caches, automatic prefetch, and the growing distance to main memory on 3D stencil computations. These calculations form the basis for a wide range of scientific applications from simple Jacobi iterations to complex multigrid and block structured adaptive PDE solvers. First we develop a simple benchmark to evaluate the effectiveness of prefetching in cache-based memory systems. Next we present a small parameterized … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
104
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 86 publications
(107 citation statements)
references
References 7 publications
3
104
0
Order By: Relevance
“…We also tried not tiling the inner loop. Kamil et al [9] recommend not tiling the inner loop because of the prefetcher. We tried multiple tile sizes and not tiling the inner loop was the best strategy for all the methods except smashing on the 4000 size domain.…”
Section: Resultsmentioning
confidence: 99%
“…We also tried not tiling the inner loop. Kamil et al [9] recommend not tiling the inner loop because of the prefetcher. We tried multiple tile sizes and not tiling the inner loop was the best strategy for all the methods except smashing on the 4000 size domain.…”
Section: Resultsmentioning
confidence: 99%
“…In order to compute the central point of the stencil, a set of neighbors has to be accessed. Some of these neighbor points are distant in the memory hierarchy, requiring many cycles in latencies to be accessed [5]. Secondly, the low computationalintensity and reuse ratios.…”
Section: Boosting Numerical Codesmentioning
confidence: 99%
“…Space blocking algorithms promote data reuse by traversing data in a specific order. Space blocking is especially useful when the dataset structure does not fit into the memory hierarchy [12,5]. Time blocking algorithms [8] perform loop unrolling over time-step sweeps to exploit the grid points as much as possible, and thus increase data reuse.…”
Section: Boosting Numerical Codesmentioning
confidence: 99%
“…A number of works have addressed optimizations of stencil computations on emerging multicore platforms [7], [16], [17], [6], [27], [26], [11], [37], [10], [4], [9], [40], [38], [41], [8], [39]. In addition, other transformations such as tiling of stencil computations for multicore architectures have been addressed in [43], [25], [21], [34].…”
Section: Related Workmentioning
confidence: 99%