Proceedings of the 9th International Conference on Supercomputing - ICS '95 1995
DOI: 10.1145/224538.224539
|View full text |Cite
|
Sign up to set email alerts
|

Decoupling synchronization and data transfer in message passing systems of parallel computers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
15
0

Year Published

1996
1996
2012
2012

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(16 citation statements)
references
References 13 publications
1
15
0
Order By: Relevance
“…It does not incur any copying/buffering during a data transfer, since low communication overhead is critical for sparse code with mixed granularities. RMA is available in modern multiprocessor architectures such as Cray-T3D [34], T3E [32], and Meiko CS-2 [15]. Since the RMA directly writes data to a remote address, it is possible that the content at the remote address is still being used by other tasks and, then, the execution at the remote processor could be incorrect.…”
Section: Scheduling and Run-time Support For 1d Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…It does not incur any copying/buffering during a data transfer, since low communication overhead is critical for sparse code with mixed granularities. RMA is available in modern multiprocessor architectures such as Cray-T3D [34], T3E [32], and Meiko CS-2 [15]. Since the RMA directly writes data to a remote address, it is possible that the content at the remote address is still being used by other tasks and, then, the execution at the remote processor could be incorrect.…”
Section: Scheduling and Run-time Support For 1d Methodsmentioning
confidence: 99%
“…The communication network of the T3D is a 3D torus. Cray provides a shared memory access library called shmem, which can achieve 126 Mbytes/s bandwidth and 2.7ms communication overhead using shmem_put() primitive [34]. We have used shmem_put() for the communications in all the implementations.…”
Section: Experimental Studiesmentioning
confidence: 99%
“…Based on these first series of experiments alone, it can not be concluded if overlap of the computation and the communication is beneficial or detrimental to performance and scalability of CHARMM on a particular platform. Decoupling computation, synchronization and data transfer resulted in better performance for certain compiled parallel programs on the Cray T3D and other machines [21].…”
Section: Previous Workmentioning
confidence: 99%
“…While the data elements are stored in a distributed array, the permutation itself is specified by a table of index pairs, where each table entry contains a source index and a destination index. Using the direct deposit model [7], synchronization and consistency are guaranteed by the use of hardware barriers, and the data transfers are performed by remote stores using the messaging system. For distributed memory systems the index relation table must be maintained in a certain order, in order to group all transfers for a given source-destination pair.…”
Section: Impact Of Memory System Performancementioning
confidence: 99%