Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

Athanasaki, Maria; Sotiropoulos, A.; Tsoukalas, G.; Koziris, Nectarios; Tsanakas, Panayiotis

doi:10.1007/s11227-005-0298-8

Cited by 6 publications

(3 citation statements)

References 46 publications

(45 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To compute the last tile coordinates, relation (2) can be applied to the last iteration point, UB = (ub 1 , ub 2 , . .…”

Section: Wavefront Transformationmentioning

confidence: 99%

“…11 An algorithm for computing the completion time of a scheduling total execution time or the completion time of a scheduling is given. In this algorithm it is assumed that the computation time of each tile and communication time between dependent tiles are overlapped [2,8,14]. Also, message passing between processors is performed in parallel and processors start to execute all the tiles on the same wavefront at the same time.…”

Section: Scheduling N-dimensional Tilesmentioning

confidence: 99%

See 1 more Smart Citation

Parallel loop generation and scheduling

Lotfi

Parsa

2009

J Supercomput

View full text Add to dashboard Cite

Loop tiling is an efficient loop transformation, mainly applied to detect coarse-grained parallelism in loops. It is a difficult task to apply n-dimensional nonrectangular tiles to generate parallel loops. This paper offers an efficient scheme to apply non-rectangular n-dimensional tiles in non-rectangular iteration spaces, to generate parallel loops. In order to exploit wavefront parallelism efficiently, all the tiles with equal sum of coordinates are assumed to reside on the same wavefront. Also, in order to assign parallelepiped tiles on each wavefront to different processors, an improved block scheduling strategy is offered in this paper.

show abstract

“…To compute the last tile coordinates, relation (2) can be applied to the last iteration point, UB = (ub 1 , ub 2 , . .…”

Section: Wavefront Transformationmentioning

confidence: 99%

Section: Scheduling N-dimensional Tilesmentioning

confidence: 99%

Parallel loop generation and scheduling

Lotfi

Parsa

2009

J Supercomput

View full text Add to dashboard Cite

show abstract

“…Athanasaki et al describe another approach that also uses tiling to reduce communication for distributed memory based clusters [2]. In their approach, an additional tiling transformation is used for aggregating processor tiles along certain hyperplanes into so-called groups, which can be executed efficiently by exploiting the availability of communicationfree shared memory processors on each node in the cluster.…”

Section: Related Workmentioning

confidence: 99%

Automatic code generation for distributed memory architectures in the polytope model

Classen

Griebl

2006

Proceedings 20th IEEE International Parallel &Amp; Distributed Processing Symposium

View full text Add to dashboard Cite

Abstract-The polytope model has been used successfully as a tool for program analysis and transformation in the field of automatic loop parallelization. However, for the final step of automatic code generation, the generated code is either only usable on shared memory architectures or severely restricts the parallelization methods that can be applied. In this paper, we present a fully automated method for generating efficient target code, which is executable on clusters that are based on a distributed memory architecture. We also provide speedup results of experiments on a local cluster.

show abstract