Proceedings IEEE International Conference on Cluster Computing CLUSTR-03 2003
DOI: 10.1109/clustr.2003.1253341
|View full text |Cite
|
Sign up to set email alerts
|

Improving the performance of MPI derived datatypes by optimizing memory-access cost

Abstract: The IntroductionThe MPI (Message Passing Interface) Standard is widely used in parallel computing for writing distributedmemory parallel programs [1,2]. MPI has a number of features that provide both convenience and high performance. One of the important features is the concept of derived datatypes. Derived datatypes enable users to describe noncontiguous memory layouts compactly and to use this compact representation in MPI communication functions. Derived datatypes also enable an MPI implementation to optim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2003
2003
2021
2021

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(15 citation statements)
references
References 10 publications
0
15
0
Order By: Relevance
“…Furthermore, Byna et al show that tuning the datatype interpreter for the memory-hierarchy on the target machine (cache and page sizes, etc.) can lead to performance improvements [3]. They do this by providing different packing functions which are parametrized with information about the actual datatype as well as information about the memory hierarchy of the node.…”
Section: Related Workmentioning
confidence: 99%
“…Furthermore, Byna et al show that tuning the datatype interpreter for the memory-hierarchy on the target machine (cache and page sizes, etc.) can lead to performance improvements [3]. They do this by providing different packing functions which are parametrized with information about the actual datatype as well as information about the memory hierarchy of the node.…”
Section: Related Workmentioning
confidence: 99%
“…Previous work in the area of MPI derived datatypes focuses on improving its performance, either by improving the way derived datatypes are represented in MPI or by using more cache efficient strategies for packing and unpacking the datatype to and from a contiguous buffer [6]. Interconnect features such as RDMA Scatter/Gather operations [20] have also been considered.…”
Section: Related Workmentioning
confidence: 99%
“…Not many scientific codes leverage MPI DDTs, even though their usage would be appropriate in many cases. One of the reasons might be that current MPI implementations in some cases still fail to deliver the expected performance, as shown by Gropp et al in [9], even though a lot of work is done on improving DDT implementations [6,18,20]. Most of this work is guided by a small number of micro-benchmarks.…”
Section: Introductionmentioning
confidence: 99%
“…To address the needs of these applications, communication layers provide higher level primitives for such transfers. These operations are usually referred to in literature as vector or strided and they have been shown to improve application performance [19,23,7]. Most of the native communication layers, such as Elan, InfiniBand Verbs, IBM LAPI provide an API for these operations.…”
Section: Introductionmentioning
confidence: 99%