Automatically enhancing locality for tree traversals with traversal splicing

Jo, Youngjoon; Kulkarni, Milind

doi:10.1145/2398857.2384643

“…In recent years, there has been more work targeting locality transformations for recursive programs, including dynamic pointer alignment [32], point blocking [20], traversal splicing [21]. These transformations are, effectively, variants of loop tiling in various ways for recursion that is nested within a normal, iterative iteration space (specifically, for loops).…”

Section: Related Workmentioning

confidence: 99%

“…In the past five years, there have been some attempts to expand the space of locality-enhancing transformations to work on recursive iteration spaces 1 . For example, Jo and Kulkarni look at transformations on recursive iteration spaces nested within an iterative iteration space, in the context of repeated traversals of trees [20,21]; the outer loop iterates over points, and the inner recursive traversal of the tree generates a recursive iteration space. Rajbhandari et al look at fusing two (or more) recursive iteration spaces, in an analog of loop fusion [25,26]; here the separate recursive iteration spaces represent separate traversals of a kd-tree, and the fusion transformation yields a single recursive traversal of the tree.…”

Section: Introductionmentioning

confidence: 99%

Locality Transformations for Nested Recursive Iteration Spaces

Sundararajah

¹

,

Sakka

²

,

Kulkarni

³

2017

SIGOPS Oper. Syst. Rev.

Self Cite

0

View full text Add to dashboard Cite

There has been a significant amount of effort invested in designing scheduling transformations such as loop tiling and loop fusion that rearrange the execution of dynamic instances of loop nests to place operations that access the same data close together temporally. In recent years, there has been interest in designing similar transformations that operate on recursive programs, but until now these transformations have only considered simple scenarios: multiple recursions to be fused, or a recursion nested inside a simple loop. This paper develops the first set of scheduling transformations for nested recursions: recursive methods that call other recursive methods. These are the recursive analog to nested loops. We present a transformation called recursion twisting that automatically improves locality at all levels of the memory hierarchy, and show that this transformation can yield substantial performance improvements across several benchmarks that exhibit nested recursion.

show abstract

“…For Barnes-Hut alone, researchers have proposed sorting using space-filling curves [2], Zcurves [9], orthogonal bisection [26], or the structure of the Barnes-Hut tree itself [28]. For ray tracers, researchers have suggested various ray-reorganization techniques [21,20,19,1,18] Rather than devising new sorting strategies for each new traversal algorithm, several researchers have looked at using the past behavior of computations to predict their future tree accesses, and hence dynamically schedule them with minimal application-specific knowledge [31,22,13,12]. Most directly relevant, as they target the same types of algorithms as this paper, is Jo and Kulkarni's traversal splicing work [13,12].…”

Section: Prior Sorting Heuristicsmentioning

confidence: 99%

“…The insight that past traversal behavior is correlated with future behavior has been exploited before, in narrower, or applicationspecific contexts by Zhang et al [31] and Pingali et al [22], and in a more general tree-traversal context by Jo and Kulkarni [13,12]. Fundamentally, these approaches all interleave the scheduling component (tracking past behavior and reorganizing computations based on that behavior) with the execution (perform the next phase of computations).…”

Section: Introductionmentioning

confidence: 99%

Hybrid CPU-GPU scheduling and execution of tree traversals

Liu

¹

,

Hegde

²

,

Kulkarni

³

2016

Proceedings of the 2016 International Conference on Supercomputing

Self Cite

View full text Add to dashboard Cite

GPUs offer the promise of massive, power-efficient parallelism. However, exploiting this parallelism requires code to be carefully structured to deal with the limitations of the SIMT execution model. In recent years, there has been much interest in mapping irregular applications to GPUs: applications with unpredictable, datadependent behaviors. While most of the work in this space has focused on ad hoc implementations of specific algorithms, recent work has looked at generic techniques for mapping a large class of tree traversal algorithms to GPUs, through careful restructuring of the tree traversal algorithms to make them behave more regularly. Unfortunately, even this general approach for GPU execution of tree traversal algorithms is reliant on ad hoc, handwritten , algorithm-specific scheduling (i.e., assignment of threads to warps) to achieve high performance. The key challenge of scheduling is that it is a highly irregular process, that requires the inspection of thread behavior and then careful sorting of those threads into warps. In this paper, we present a novel scheduling and execution technique for tree traversal algorithms that is both general and automatic. The key novelty is a hybrid, inspector-executor approach: the GPU partially executes tasks to inspect thread behavior and transmits information back to the CPU, which uses that information to perform the scheduling itself, before executing the remaining, carefully scheduled, portion of the traversals on the GPU. We applied this framework to six tree traversal algorithms, achieving significant speedups over optimized GPU code that does not perform application-specific scheduling. Further, we show that in many cases, our hybrid approach is able to deliver better performance even than GPU code that uses handtuned, application-specific scheduling.

show abstract

“…In the past five years, there have been some attempts to expand the space of locality-enhancing transformations to work on recursive iteration spaces 1 . For example, Jo and Kulkarni look at transformations on recursive iteration spaces nested within an iterative iteration space, in the context of repeated traversals of trees [20,21]; the outer loop iterates over points, and the inner recursive traversal of the tree generates a recursive iteration space. Rajbhandari et al look at fusing two (or more) recursive iteration spaces, in an analog of loop fusion [25,26]; here the separate recursive iteration spaces represent separate traversals of a kd-tree, and the fusion transformation yields a single recursive traversal of the tree.…”

Section: Introductionmentioning

confidence: 99%

Locality Transformations for Nested Recursive Iteration Spaces

Sundararajah

¹

,

Sakka

²

,

Kulkarni

³

2017

Self Cite

View full text Add to dashboard Cite

There has been a significant amount of effort invested in designing scheduling transformations such as loop tiling and loop fusion that rearrange the execution of dynamic instances of loop nests to place operations that access the same data close together temporally. In recent years, there has been interest in designing similar transformations that operate on recursive programs, but until now these transformations have only considered simple scenarios: multiple recursions to be fused, or a recursion nested inside a simple loop. This paper develops the first set of scheduling transformations for nested recursions: recursive methods that call other recursive methods. These are the recursive analog to nested loops. We present a transformation called recursion twisting that automatically improves locality at all levels of the memory hierarchy, and show that this transformation can yield substantial performance improvements across several benchmarks that exhibit nested recursion.

show abstract

Automatically enhancing locality for tree traversals with traversal splicing

Cited by 6 publications

References 39 publications

Locality Transformations for Nested Recursive Iteration Spaces

Locality Transformations for Nested Recursive Iteration Spaces

Hybrid CPU-GPU scheduling and execution of tree traversals

Locality Transformations for Nested Recursive Iteration Spaces

Contact Info

Product

Resources

About