2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis 2008
DOI: 10.1109/sc.2008.5222004
|View full text |Cite
|
Sign up to set email alerts
|

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Abstract: Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations-a class of algorithms at the heart of many structured grid codes, including PDE solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
399
0
2

Year Published

2011
2011
2014
2014

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 378 publications
(408 citation statements)
references
References 14 publications
7
399
0
2
Order By: Relevance
“…This hand-tuning rarely generalizes well to new hardware generations or different input domains, is prone to error, results in unmaintainable code, and does not even guarantee optimal performance. One of the reasons is that GPU kernels can yield staggeringly large optimization spaces [Datta et al, 2008]. The problem is further compounded by the fact that these spaces can be highly discontinuous [Ryoo et al, 2008], difficult to explore, and optimal performance is often realized at the edge of "performance cliffs" induced by hard device-specific constraints (e.g.…”
Section: Motivationmentioning
confidence: 99%
See 1 more Smart Citation
“…This hand-tuning rarely generalizes well to new hardware generations or different input domains, is prone to error, results in unmaintainable code, and does not even guarantee optimal performance. One of the reasons is that GPU kernels can yield staggeringly large optimization spaces [Datta et al, 2008]. The problem is further compounded by the fact that these spaces can be highly discontinuous [Ryoo et al, 2008], difficult to explore, and optimal performance is often realized at the edge of "performance cliffs" induced by hard device-specific constraints (e.g.…”
Section: Motivationmentioning
confidence: 99%
“…Two major auto-tuning approaches have emerged in the extensive literature covering the subject (see surveys in e.g. [Vuduc et al, 2001, Williams, 2008, Datta et al, 2008, Cavazos, 2008, Li et al, 2009, Park et al, 2011): analytical model-driven optimization and empirical optimization [Yotov et al, 2003].…”
Section: Auto-tuningmentioning
confidence: 99%
“…By looping over all cells, it generates the entire S (n) . This is analogous to stencil computation for solving partial differential equations, in which a stencil (equivalent to computation pattern in our framework) defines local computation rules for each grid point and its neighbor grid points and the stencil is applied to all grid points in a lattice to solve the problem [27,28].…”
Section: Uniform-cell Pattern MDmentioning
confidence: 99%
“…A number of works have addressed optimizations of stencil computations on emerging multicore platforms [7], [16], [17], [6], [27], [26], [11], [37], [10], [4], [9], [40], [38], [41], [8], [39]. In addition, other transformations such as tiling of stencil computations for multicore architectures have been addressed in [43], [25], [21], [34].…”
Section: Related Workmentioning
confidence: 99%