1998
DOI: 10.1145/277652.277723
|View full text |Cite
|
Sign up to set email alerts
|

Communication optimizations for parallel C programs

Abstract: This paper presents algorithms for reducing the communication overhead for parallel C programs that use dynamically allocated data structures. The framework consists of an analysis phase called possible-placement analysis, and a transformation phase called communication selection. The fundamental idea of possible-placement analysis is to find all possible points for insertion of remote memory operations. Remote reads are propagated upwards, whereas remote writes are propagated downwards. Based on the results o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2003
2003
2012
2012

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 16 publications
0
10
0
Order By: Relevance
“…One popular method for optimizing MPI applications is to overlap the computation of MPI applications and the communication of underlying MPI libraries since it can hide costly communication latency [4], [5], [6], [7]. Such overlapping can be achieved by invoking nonblocking MPI functions or RDMA (Remote Direct Memory Access) functions.…”
Section: A Motivationmentioning
confidence: 99%
“…One popular method for optimizing MPI applications is to overlap the computation of MPI applications and the communication of underlying MPI libraries since it can hide costly communication latency [4], [5], [6], [7]. Such overlapping can be achieved by invoking nonblocking MPI functions or RDMA (Remote Direct Memory Access) functions.…”
Section: A Motivationmentioning
confidence: 99%
“…The straightforward approach to applying this transformation is to convert one-sided blocking get/put operations into an initiation call and a corresponding synchronization call, then perform code motion to separate the two as far as possible while inserting independent computation or communication code in between. Several studies [30,17,6] have proposed global communication scheduling techniques that attempt to find an optimal arrangement for all non-blocking memory accesses. Other variants of this optimization such as message strip mining [27] and software prefetching [18] are also useful in reducing an application's stall times due to communication latencies.…”
Section: Communication/computation Overlapmentioning
confidence: 99%
“…Several studies ( [5], [22]) present compiler algorithms that perform possible-placement analysis, both on basic blocks and whole programs. Code motion of communication operations in UPC needs to be supplemented with an analysis to ensure that the new schedule of operations does not violate the memory consistency model of the language.…”
Section: Communication and Computation Overlapmentioning
confidence: 99%