Compiling tiled iteration spaces for clusters

Gournas, G.; Drosinos, Nikolaos; Athanasaki, Maria; Koziris, Nectarios

doi:10.1109/clustr.2002.1137768

Cited by 11 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Each MPI node assumes the execution of a sequence of tiles, successive along the longest dimension of the original iteration space. The complete methodology is described more extensively in [5]. It must be noted that since our prime objective was to experimentally verify the performance benefits of the different parallelization models, for the sake of simplicity we resorted to hand-made parallelization, as opposed to automatic parallelization.…”

Section: Pure Mpi Paradigmmentioning

confidence: 99%

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Drosinos

Koziris

2003

Recent Advances in Parallel Virtual Machine and Message Passing Interface

View full text Add to dashboard Cite

Abstract. The parallelization process of nested-loop algorithms onto popular multi-level parallel architectures, such as clusters of SMPs, is not a trivial issue, since the existence of data dependencies in the algorithm impose severe restrictions on the task decomposition to be applied. In this paper we propose three techniques for the parallelization of such algorithms, namely pure MPI parallelization, fine-grain hybrid MPI/OpenMP parallelization and coarse-grain MPI/OpenMP parallelization. We further apply an advanced hyperplane scheduling scheme that enables pipelined execution and the overlapping of communication with useful computation, thus leading almost to full CPU utilization. We implement the three variations and perform a number of micro-kernel benchmarks to verify the intuition that the hybrid programming model could potentially exploit the characteristics of an SMP cluster more efficiently than the pure messagepassing programming model. We conclude that the overall performance for each model is both application and hardware dependent, and propose some directions for the efficiency improvement of the hybrid model.

show abstract

Section: Pure Mpi Paradigmmentioning

confidence: 99%

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Drosinos

Koziris

2003

Recent Advances in Parallel Virtual Machine and Message Passing Interface

View full text Add to dashboard Cite

show abstract

“…For the first two problems there is a skewed and an unskewed version [12], and for each version there are four (communication and scheduling) optimal tiling matrices (P1 − P4), calculated as described in [14] and [26]. The compilation efficiency of a method is measured by means of the compilation time, which increases as the number of row-operations performed increases.…”

Section: Measuring the Compilation Time And Performance Of Sequentialmentioning

confidence: 99%

“…In each case, we applied rectangular and non-rectangular tiling transformations. Although, as described in [12], non-rectangular tiling can be directly applied to the initial SOR and Jacobi code, in order to compare rectangular v.s. non-rectangular tiling, we apply them to the skewed version of their iteration space.…”

Section: Measuring the Performance Of Arbitrarily Tiled Parallel Codementioning

confidence: 99%

Automatic parallel code generation for tiled nested loops

Goumas¹,

Drosinos²,

Athanasaki³

et al. 2004

Proceedings of the 2004 ACM Symposium on Applied Computing

View full text Add to dashboard Cite

This paper presents an overview of our work, concerning a complete end-to-end framework for automatically generating message passing parallel code for tiled nested for-loops. It considers general parallelepiped tiling transformations and general convex iteration spaces. We address all problems regarding both the generation of sequential tiled code and its parallelization. We have implemented our techniques in a tool which automatically generates MPI parallel code and conducted several series of experiments, concerning the compilation time of our tool, the efficiency of the generated code and the speedup attained on a cluster of PCs. Apart from confirming the value of our techniques, our experimental results show the merit of general parallelepiped tiling transformations and verify previous theoretical work on scheduling-optimal tile shapes.

show abstract

“…On the other hand in distributed memory computer systems, it is known that coarse-grain methods such as the ones presented in [10], [11], [12] offer better data locality and more efficient use of the memory than the fine-grain approaches. The use of coarse-grain methods can lead to performance improvements also in the case of embedded systems.…”

Section: Introductionmentioning

confidence: 99%

Implementation of dynamic loop scheduling in reconfigurable platforms

Riakiotakis

Παπακωνσταντίνου

Chronopoulos

2008

2008 International Symposium on Industrial Embedded Systems

View full text Add to dashboard Cite

Abstract-Dynamic scheduling algorithms have been successfully used for parallel computations of nested loops in traditional parallel computers and clusters. In this paper we propose a new architecture, implementing a coarse grain dynamic loop scheduling, suitable for reconfigurable hardware platforms. We use an analytical model and a case study to evaluate the performance of the proposed architecture. This approach makes efficient memory and processing elements use and thus gives better results than previous approaches.

show abstract

Compiling tiled iteration spaces for clusters

Cited by 11 publications

References 13 publications

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Automatic parallel code generation for tiled nested loops

Implementation of dynamic loop scheduling in reconfigurable platforms

Contact Info

Product

Resources

About