2010
DOI: 10.1177/1094342009360206
|View full text |Cite
|
Sign up to set email alerts
|

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming

Abstract: As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
32
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 53 publications
(32 citation statements)
references
References 12 publications
0
32
0
Order By: Relevance
“…Work early in the project was performed in collaboration with both Argonne and the IBM Blue Gene team [10]. Work on fine grain multithreading support showed how to avoid excessive lock overhead in an MPI implementation [3,2]. Recent work included a new algorithm for efficient allocation of context ids in MPI fixes a subtle race condition in the algorithm that had been used in MPICH; this new algorithm retains the efficient behavior for the expected case [9].…”
Section: Some Of the Most Interesting Results From This Project Addrementioning
confidence: 99%
“…Work early in the project was performed in collaboration with both Argonne and the IBM Blue Gene team [10]. Work on fine grain multithreading support showed how to avoid excessive lock overhead in an MPI implementation [3,2]. Recent work included a new algorithm for efficient allocation of context ids in MPI fixes a subtle race condition in the algorithm that had been used in MPICH; this new algorithm retains the efficient behavior for the expected case [9].…”
Section: Some Of the Most Interesting Results From This Project Addrementioning
confidence: 99%
“…In addition to this common object allocation layer used for all MPI objects, MPICH2 provides another small optimization above this layer for MPI_Request objects [3]. In general, an MPI implementation must allocate a request object for each communication operation such as MPI_Send or MPI_Recv.…”
Section: Mpich2 Internals Backgroundmentioning
confidence: 99%
“…For example, keeping track of |T | by atomically incrementing or decrementing a shared counter on every modification of T would incur a severe performance penalty. As mentioned in Section III, MPICH2 currently uses a thread-local storage optimization [3] to manage request allocation. This optimization eliminates virtually all contention from request allocation.…”
Section: B Reference Counting With Garbage Collection Hybridizationmentioning
confidence: 99%
See 1 more Smart Citation
“…When a lock is released, a thread waiting for the global lock gets access to the lock and performs progress on its MPI communication. We are also developing a more efficient version of MPICH2 that supports finer-grained locks [1].…”
Section: Threadsmentioning
confidence: 99%