Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

Sahoo, Sanjaya Kumar; Krishnamoorthy, Sriram; Panuganti, Rajkiran; Sadayappan, P.

doi:10.1109/sc.2005.35

Cited by 9 publications

(10 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The key decisions to be made are 1) determining the loop structure, comprising of fused and tiled loops, 2) the placement of memory allocation and data movement statements for each of the memory hierarchy levels, and 3) the tile sizes. It has been shown previously that the the space of possible choices is exponentially large [33]. Besides considering additional levels in the memory hierarchy in determining fusion structures, we focus on the prescriptive approaches to choosing candidate loop fusions without requiring expensive optimization procedures.…”

Section: Problem Statement and Notationmentioning

confidence: 99%

“…from disks to main memory). A detailed description is available from an earlier publication [33]. In a tensor contraction, the indices can be grouped into those that are contracted and the remaining indices in the two input tensors.…”

Section: Single Contraction Optimizationmentioning

confidence: 99%

“…It can be shown that to minimize the data movement costs, the smallest array is retained in memory, with the other two arrays being streamed. This results in a memory cost of [33]:…”

Section: Single Contraction Optimizationmentioning

confidence: 99%

“…Our work is specifically in the context of tensor contractions [33,14], which can be viewed as generalized matrix products. Sequences of tensor contractions arise in several domains, particularly, in ab initio computational models in quantum chemistry.…”

Section: Introductionmentioning

confidence: 99%

“…Specifically, we consider data movement costs under memory constraints. For systems with only two levels of memory hierarchy, this problem has been addressed before by Sahoo et al [33].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies

Krishnamoorthy

Agrawal

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Modern architectures are characterized by deeper levels of memory hierarchy, often explicitly addressable. Optimizing applications for such architectures requires careful management of the data movement across all these levels. In this paper, we focus on the problem of mapping tensor contractions to memory hierarchies with more than two levels, specifically addressing placement of memory allocation and data movement statements, choice of loop fusions, and tile size selection. Existing algorithms to find an integrated solution to this problem even for two-level memory hierarchies have been shown to be expensive. We improve upon this work by focusing on the first-order cost components, simplifying the analysis required and reducing the number of candidates to be evaluated. We have evaluated our framework on a cluster of GPUs. Using five candidate tensor contraction expressions, we show that fusion at multiple levels improves performance, and our framework is effective in determining profitable transformations.

show abstract

Section: Problem Statement and Notationmentioning

confidence: 99%

Section: Single Contraction Optimizationmentioning

confidence: 99%

“…It can be shown that to minimize the data movement costs, the smallest array is retained in memory, with the other two arrays being streamed. This results in a memory cost of [33]:…”

Section: Single Contraction Optimizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Specifically, we consider data movement costs under memory constraints. For systems with only two levels of memory hierarchy, this problem has been addressed before by Sahoo et al [33].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies

Krishnamoorthy

Agrawal

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations

Gao

Krishnamoorthy

Sahoo

et al. 2006

Languages and Compilers for Parallel Computing

View full text Add to dashboard Cite

SUMMARYCompile-time optimizations involve a number of transformations such as loop permutation, fusion, tiling, array contraction etc. The selection of the appropriate transformation to minimize the execution time is a challenging task. We address this problem in the context of tensor contraction expressions involving arrays too large to fit in main memory. Domain-specific features of the computation are exploited to develop an integrated framework that facilitates the exploration of the entire search space of optimizations. In this paper, we discuss the exploration of the space of loop fusion and tiling transformations in order to minimize the disk I/O cost. These two transformations are integrated and pruning strategies are presented that significantly reduce the number of loop structures to be evaluated for subsequent transformations. The evaluation of the framework using representative contraction expressions from quantum chemistry shows a dramatic reduction in the size of the search space using the strategies presented.

show abstract

Hypergraph Partitioning for Automatic Memory Hierarchy Management

Krishnamoorthy¹,

Catalyurek²,

Nieplocha³

et al. 2006

ACM/IEEE SC 2006 Conference (SC'06)

Self Cite

View full text Add to dashboard Cite

In this paper, we present a mechanism for automatic management of the memory hierarchy, including secondary storage, in the context of a global address space parallel programming framework. The programmer specifies the parallelism and locality in the computation. The scheduling of the computation into stages, together with the movement of the associated data between secondary storage and global memory, and between global memory and local memory, is automatically managed. A novel formulation of hypergraph partitioning is used to model the optimization problem of minimizing disk I/O. Experimental evaluation of the proposed approach using a sub-computation from the quantum chemistry domain shows a reduction in the disk I/O cost by up to a factor of 11, and a reduction in turnaround time by up to 49%, as compared to alternative approaches used in state-of-the-art quantum chemistry codes.

show abstract

Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions

Cited by 9 publications

References 40 publications

Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies

Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies

Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations

Hypergraph Partitioning for Automatic Memory Hierarchy Management

Contact Info

Product

Resources

About