Array distribution in data-parallel programs

Chatterjee, Siddhartha; Schreiber, Robert; Sheffler, Thomas J.; Gilbert, John R.

doi:10.1007/bfb0025872

Cited by 23 publications

(11 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Both a fine-grain and the optimal coarse-grain static partitioning will be compared with the dynamic partitioning. 8 In the current implementation, loop peeling is not performed on the actual code. As previously mentioned in Section 4.2, the single additional startup redistribution due to not peeling will not be significant in comparison to the execution of the loop (containing a dynamic count of 600 redistributions).…”

Section: -D Alternating Direction Implicit (Adi) Iterative Methodsmentioning

confidence: 98%

“…Nodes in the ADG represent data-parallel computations while edges represent the flow of data. The ADG itself is formed in a divide-and-conquer approach using heuristics to approximately solve a combinatorial minimization problem at each step, taking into account both redistribution costs as well as all candidate distributions, to determine where to partition the program into subphases [8]. Candidates are formed by first identifying the extents, or iteration space, of all objects in the program resulting in a number of clusters.…”

Section: Static Partitioningmentioning

confidence: 99%

See 1 more Smart Citation

Dynamic Data Partitioning for Distributed-Memory Multicomputers

Palermo

Hodges

Banerjee

1996

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Section: -D Alternating Direction Implicit (Adi) Iterative Methodsmentioning

confidence: 98%

Section: Static Partitioningmentioning

confidence: 99%

Dynamic Data Partitioning for Distributed-Memory Multicomputers

Palermo

Hodges

Banerjee

1996

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

“…The identification of program segments in which data can be statically mapped and the accurate modeling of the potential remapping costs make the dynamic data-mapping problem harder than the static problem. The smallest possible statically mapped program regions may be single statements [Chatterjee et al 1993;Chatterjee et al 1994;Philippsen 1995], loop nests [Ayguadé et al 1994;Anderson and Lam 1993;Lee and Tsai 1993;Ning et al 1995;Palermo and Banerjee 1995;Tandri and Abdelrahman 1997], or groups of statements or loop nests for which it can be shown that remapping between them can never be profitable [Sheffler et al 1996]. More recent work tries to make the mapping decisions independent of the particular program structure [Kelly and Pugh 1996].…”

Section: Dynamic Mappingsmentioning

confidence: 99%

Automatic data layout for distributed-memory machines

Kennedy

Kremer

1998

ACM Trans. Program. Lang. Syst.

View full text Add to dashboard Cite

The goal of languages like Fortran D or High Performance Fortran (HPF) is to provide a simple yet efficient machine-independent parallel programming model. After the algorithm selection, the data layout choice is the key intellectual challenge in writing an efficient program in such languages. The performance of a data layout depends on the target compilation system, the target machine, the problem size, and the number of available processors. This makes the choice of a good layout extremely difficult for most users of such languages. If languages such as HPF are to find general acceptance, the need for data layout selection support has to be addressed. We believe that the appropriate way to provide the needed support is through a tool that generates data layout specifications automatically. This article discusses the design and implementation of a data layout selection tool that generates HPF-style data layout specifications automatically. Because layout is done in a tool that is not embedded in the target compiler and hence will be run only a few times during the tuning phase of an application, it can use techniques such as integer programming that may be considered too computationally expensive for inclusion in production compilers. The proposed framework for automatic data layout selection builds and examines search spaces of candidate data layouts. A candidate layout is an efficient layout for some part of the program. After the generation of search spaces, a single candidate layout is selected for each program part, resulting in a data layout for the entire program. A good overall data layout may require the remapping of arrays between program parts. A performance estimator based on a compiler model, an execution model, and a machine model are needed to predict the execution time of each candidate layout and the costs of possible remappings between candidate data layouts. In the proposed framework, instances of NP-complete problems are solved during the construction of candidate layout search spaces and the final selection of candidate layouts from each search space. Rather than resorting to heuristics, the framework capitalizes on state-ofthe-art 0-1 integer programming technology to compute optimal solutions of these NP-complete problems. A prototype data layout assistant tool based on our framework has been implemented as part of the D system currently under development at Rice University. The article reports preliminary experimental results. The results indicate that the framework is efficient and allows the generation of data layouts of high quality.

show abstract

“…The starting time was taken 7 Raw measures for transpositions are displayed. Tables 5 and 6 show the transposition times for various matrix sizes and distributions.…”

Section: Experimental Conditionsmentioning

confidence: 99%

“…Data remapping and replication often need to be combined: A parallel matrix multiplication accesses a whole row and column of data to compute each single target element, hence the need to remap data with some replication for parallel execution. Moreover, automatic data layout tools 24,7] suggest data remappings between computation phases. Thus handling data remappings e ciently is an important issue for high performance computing.…”

Section: Introductionmentioning

confidence: 99%

Optimal Compilation of HPF Remappings

Coelho

Ancourt

1996

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Applications with varying array access patterns require to dynamically change array mappings on distributed-memory parallel machines. Hpf (High Performance Fortran) provides such remappings, on data that can be replicated, explicitly through the realign and redistribute directives and implicitly at procedure calls and returns. However such features are left out of the hpf subset or of the currently discussed hpf kernel for e ciency reasons. This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. The rst phase is global and removes all useless remappings that appear naturally in procedures. The code generated by the second phase takes advantage of replications to shorten the remapping time. It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network.The technique is fully implemented in hpfc, our prototype hpf compiler. Experiments were performed on a Dec Alpha farm.

show abstract

Array distribution in data-parallel programs

Cited by 23 publications

References 10 publications

Dynamic Data Partitioning for Distributed-Memory Multicomputers

Dynamic Data Partitioning for Distributed-Memory Multicomputers

Automatic data layout for distributed-memory machines

Optimal Compilation of HPF Remappings

Contact Info

Product

Resources

About