Mischa Dieterle scite author profile

2009

Recent publications have emphasised map-reduce as a general programming model (labelled Google map-reduce), and described existing high-performance implementations for large data sets. We present two parallel implementations for this Google map-reduce skeleton, one following earlier work, and one optimised version, in the parallel Haskell extension Eden. Eden's specific features, like lazy stream processing, dynamic reply channels, and nondeterministic stream merging, support the efficient implementation of the complex coordination structure of this skeleton. We compare the two implementations of the Google map-reduce skeleton in usage and performance, and deliver runtime analyses for example applications. Although very flexible, the Google mapreduce skeleton is often too general, and typical examples reveal a better runtime behaviour using alternative skeletons.

Hierarchical Master-Worker Skeletons

et al.

Master-worker systems are a well-known and often applicable scheme for the parallel evaluation of a pool of tasks, a work pool. The system consists of a master process managing a set of worker processes. After an initial phase with a fixed amount of tasks for each worker, further tasks are distributed in reply to results sent back by the workers. As this setup quickly leads to a bottleneck in the master process, the paper investigates techniques for hierarchically nesting the basic master-worker scheme. We present implementations of hierarchical master-worker skeletons, and how to automatically calculate parameters of the nested skeleton for good performance.Nesting master-worker systems is nontrivial especially in cases where new tasks are dynamically created from previous results (typically breadthor depth-first tree search algorithms). We discuss how to handle dynamically growing pools in a hierarchy and present a declarative implementation for nested master-worker systems with dynamic task creation.The skeletons are experimentally evaluated with two typical test programs. We analyse their runtime behaviour and the effects of different hierarchies on runtimes via trace visualisations.

Skeleton Composition Using Remote Data

Horstmeyer

2010

Abstract. Skeletons simplify parallel programming by providing general patterns of parallel computations. When several skeletons are used inside the same program, skeleton composition usually leads to aggregation and redistribution of the intermediate data on a single process. Though the programmer can overcome the performance loss at a lower level of abstraction by altering the existing skeletons or not using them at all. A high-level concept like skeleton-based programming, however, calls for a more general solution.Remote data provides runtime mechanisms that allow declaratively specified processes to access other processes' data via remote handles. This enables the programmer to easily build complex skeletons by combining simpler ones. Skeletons can be composed without the drawback of collecting and then redistributing the data in between two skeleton instances. Another advantage is that skeletons which inherently depend on their inner communication patterns are easily implemented using remote data. We present the implementation of remote data in the parallel functional language Eden and show the definition of some example skeletons with a remote data interface.

A Skeleton for Distributed Work Pools in Eden

2010

Abstract. We present a flexible skeleton for implementing distributed work pools in our parallel functional language Eden. The skeleton manages a pool of tasks (work pool) in a distributed manner using a demanddriven work stealing approach for load balancing. All coordination is done locally within the worker processes. The latter are arranged in a ring topology and exchange additional channels to shortcut communication paths. The skeleton is suited for different types of algorithms, namely simple data parallel ones and standard tree search algorithms like backtracking, and using a global state as needed for branch-and-bound. Runtime experiments reveal a stable runtime behaviour for the different algorithm classes as illustrated by activity profiles (timeline diagrams). Acceptable speedups can be achieved with low effort.

Parallel FFT with Eden Skeletons

Lobachev

et al. 2009

Abstract. The notion of Fast Fourier Transformation (FFT) describes a range of efficient algorithms to compute the discrete Fourier transformation, frequency distribution in a signal. FFT plays a major role both for pure mathematical applications and for real-life scenarios such as digital signal processing. The paper investigates and compares skeletonbased parallel Haskell implementations of different FFT-algorithms on workstation clusters with distributed memory. Our experiments show that the original divide-and-conquer versions suffer from an inherent input distribution and result collection problem, because huge amounts of data have to be communicated. Advanced approaches like distributable homomorphism FFT or multidimensional FFT provide more flexibility to overcome these problems. Assuming a distributed access to input data and re-organising computation in such a way that the results can be returned in a distributed way leads to versions with an acceptable parallel runtime behaviour.