1997
DOI: 10.1109/71.577265
|View full text |Cite
|
Sign up to set email alerts
|

Fusion of loops for parallelism and locality

Abstract: Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which prevent parallelism. In addition, performance losses result from cache conflicts in fused loops. In this paper, we present new techniques to: 1) allow fusion of loop nests in the presence of fusion-preventing dependences, 2) maintain parallelism and allow the parallel execution of fused loops with minimal syn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
44
0

Year Published

1999
1999
2019
2019

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 83 publications
(44 citation statements)
references
References 21 publications
0
44
0
Order By: Relevance
“…This transformation is not only help for parallelization, but also contributes benefits to other aspects, such as reducing loop bound testing, promoting data locality etc [19]. Manjikian et al [20] presented a typical approach, which integrates both parallelism and memory locality optimization. Darte [19] discussed the complexity of loop transformation, which is help for knowing the ultimate potentialities of this method.…”
Section: Loop Transformationmentioning
confidence: 99%
“…This transformation is not only help for parallelization, but also contributes benefits to other aspects, such as reducing loop bound testing, promoting data locality etc [19]. Manjikian et al [20] presented a typical approach, which integrates both parallelism and memory locality optimization. Darte [19] discussed the complexity of loop transformation, which is help for knowing the ultimate potentialities of this method.…”
Section: Loop Transformationmentioning
confidence: 99%
“…Mitchel et al study interactions between tiling for multiple objectives at once [16]. In addition to tiling, researchers working on locality optimizations have considered both computation-reordering transformations such as loop permutation [9,17,25] and loop fission/fusion [15,17]. Scalar replacement replaces array references with scalars, reducing the number of memory references if the compiler later puts the scalar in a register [3].…”
Section: Related Workmentioning
confidence: 99%
“…However, these techniques do not consider the ghost zone technique that reduces the inter-tile communication. Although loop fusion [25] and time skewing [36] are able to generate tiles that can execute concurrently with improved locality, they cannot eliminate the communication between concurrent tiles if more than one stencil loops are fused into one tile. This enforces bulk-synchronous systems, such as NVIDIA's Tesla architecture, to frequently synchronizes computation among different PEs, which eventually penalizes performance.…”
Section: Related Workmentioning
confidence: 99%