Compiler Optimization of Dynamic Data Distributions for Distributed-Memory Multicomputers

Palermo, Daniel J.; Hodges, Eugene W.; Banerjee, P.

doi:10.1007/3-540-45403-9_13

Cited by 4 publications

(5 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…First it considers a specific class of parallel algorithms that use macro-pipelining techniques to exhibit parallelism in matrix computations. Models and implementations of such algorithms have been proposed both for distributed memory [1,7,9,11,12,16] and shared memory machines [2]. But these works focus on data that fit into memory.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Out-of-Core Wavefront Computations with Reduced Synchronization

Clauss

Gustedt

Suter

2008

16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)

View full text Add to dashboard Cite

Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed wanders in form of a 'wave' through the matrix. Macro-pipelining techniques can achieve an efficient parallelization of such algorithms by overlapping communication and computation. Usually these techniques are limited to situations where all the data to be processed fits into main memory, whereas for larger data the I/O usage pattern for external storage requires special attention. The work [5] presented a first extension of the wavefront framework to these so-called out-of-core problems. The present paper proposes a redesign of their algorithm that minimizes both overhead and perturbations coming from communications. To tackle the issue of non-contiguous I/O, we also propose an optimized data layout. These two major modifications of the original algorithm eventually allow us to present a third improvement as our implementation shortens the transition phase between two consecutive iterations of the wavefront algorithm. Experiments performed with the PARXXL library show that we can significantly reduce the time lost during inefficient I/O operations and thus obtain faster computations.

show abstract

Section: Related Workmentioning

confidence: 99%

“…Thereby, they allow for an overlap of computation and communication by reordering loops [8] and adding pipeline loops. These techniques can be used for several applications with wavefront computations like the ADI [9,12], Gauss-Seidel [1], SOR [11], or Sweep3D [7,16] algorithms.…”

Section: Introductionmentioning

confidence: 99%

Out-of-Core Wavefront Computations with Reduced Synchronization

Clauss

Gustedt

Suter

2008

16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)

View full text Add to dashboard Cite

show abstract

“…Wholey (42) uses hillclimbing that searches a space of possible data mappings to find the one minimizing the cost of a program segment. Palermo (34) applies branchand-bound approach to decompose a program into a number of phases and redistributes the data between different phases. Anderson and Lam (1) describe a linear algebra framework for the global optimization of array partitioning, orientation and displacement.…”

Section: Introductionmentioning

confidence: 99%

Memetic Algorithms for Parallel Code Optimization

Özcan

Onbasioglu

2006

Int J Parallel Prog

View full text Add to dashboard Cite

Discovering the optimum number of processors and the distribution of data on distributed memory parallel computers for a given algorithm is a demanding task. A memetic algorithm (MA) is proposed here to find the best number of processors and the best data distribution method to be used for each stage of a parallel program. Steady state memetic algorithm is compared with transgenerational memetic algorithm using different crossover operators and hill-climbing methods. A self-adaptive MA is also implemented, based on a multimeme strategy. All the experiments are carried out on computationally intensive, communication intensive, and mixed problem instances. The MA performs successfully for the illustrative problem instances.

show abstract

“…In this case, a redistribution has to be performed that transforms Bl's output distribution of z into the input distribution expected by B. For array variables, the redistribution operation can be selected from a redistribution library [20]. For other data structures, the programmer would have to provide appropriate redistribution operations.…”

Section: Data Redistributionmentioning

confidence: 99%

“…[23] considers the generation of array redistributions between tasks representing functional parallelism. [21] and [20] present a data-flow analysis to determine and optimize array redistributions in HPF programs. [ 171 presents similar algorithms for the Fortran D compiler.…”

Section: Related Workmentioning

confidence: 99%

Integrating library modules into special purpose parallel algorithms

Rauber

Rünger²

Proceedings of PDSE '97: 2nd International Workshop on Software Engineering for Parallel and Distributed Systems

View full text Add to dashboard Cite

Most programs from scientific computing can benejit from the use of numerical libraries which provide efJicient implementations for standard solution methods that often occur in numerical simulations. This is especially true for parallel scientific computing. A methodology that allows the integration of library functions without any additional programming effort would ease this programming style. In this papel; we address the question how to integrate library procedures into hierarchically organizedparallel programs. The hierarchical structure of a specific algorithms results from a top-down decomposition into submethods which can be realized by library functions. The integration of library functions not only requires a correct specification of data dependencies between different modules but has also to take into account a possible distribution of data among the processors. We present algorithms for the adaptation of library modules such that theirfunctional type and underlying data distribution jit into the hierarchical framework. The adaptation includes the construction of dataflow graphs that can be used to determine data distributions for the library modules such that a minimal global execution time results.

show abstract

Compiler Optimization of Dynamic Data Distributions for Distributed-Memory Multicomputers

Cited by 4 publications

References 26 publications

Out-of-Core Wavefront Computations with Reduced Synchronization

Out-of-Core Wavefront Computations with Reduced Synchronization

Memetic Algorithms for Parallel Code Optimization

Integrating library modules into special purpose parallel algorithms

Contact Info

Product

Resources

About