Proceedings of the 26th ACM International Conference on Supercomputing 2012
DOI: 10.1145/2304576.2304605
|View full text |Cite
|
Sign up to set email alerts
|

Composable, non-blocking collective operations on power7 IH

Abstract: The Power7 IH (P7IH) is one of IBM's latest generation of supercomputers. Like most modern parallel machines, it has a hierarchical organization consisting of simultaneous multithreading (SMT) within a core, multiple cores per processor, multiple processors per node (SMP), and multiple SMPs per cluster. A low latency/high bandwidth network with specialized accelerators is used to interconnect the SMP nodes. System software is tuned to exploit the hierarchical organization of the machine.In this paper we presen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2012
2012
2016
2016

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…The new code transformations are prototyped in the XLUPC compiler framework [32] that uses the IBM PGAS runtime [33]. The compiler contains additional optimizations for UPC and other languages, including C and C++.…”
Section: Methodsmentioning
confidence: 99%
“…The new code transformations are prototyped in the XLUPC compiler framework [32] that uses the IBM PGAS runtime [33]. The compiler contains additional optimizations for UPC and other languages, including C and C++.…”
Section: Methodsmentioning
confidence: 99%
“…PAMI collective library builds upon the portable Component Collective Messaging Interface (CCMI [4]) from Blue Gene/P, with optimized implementations on Blue Gene/Q, Power7 IH [6] and Intel x86 clusters.…”
Section: Pami Collectivesmentioning
confidence: 99%
“…Optimizations are possible for both point to point messages and collectives. Very good performance can often be attained using this approach [18]. This, of course, still leaves the system with a very large number of processes running, which can cause bottlenecks.…”
Section: Related Workmentioning
confidence: 99%
“…A major difference is the completion notification using call backs in PAMI versus events in CCI. In a related paper [18] we discuss the importance of callbacks for non blocking primitives (P2P, collective) composition. Another important difference relative to the results reported in [6] is that this paper focuses especially on a multi-threaded deployment while the CCI paper only present result for the distributed memory case.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation