Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis 2013
DOI: 10.1145/2503210.2503272
|View full text |Cite
|
Sign up to set email alerts
|

An early performance evaluation of many integrated core architecture based SGI rackable computing system

Abstract: Intel recently introduced the Xeon Phi coprocessor based on the Many Integrated Core architecture featuring 60 cores with a peak performance of 1.0 Tflop/s. NASA has deployed a 128-node SGI Rackable system where each node has two Intel Xeon E2670 8-core Sandy Bridge processors along with two Xeon Phi 5110P coprocessors. We have conducted an early performance evaluation of the Xeon Phi. We used microbenchmarks to measure the latency and bandwidth of memory and interconnect, I/O rates, and the performance of Ope… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
13
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(14 citation statements)
references
References 14 publications
1
13
0
Order By: Relevance
“…It should be noted that performance of MPI functions in native MIC mode is 3 to 20 times worse than in native host mode as reported by Saini et al [13]. Poor scalability for BT and SP on MIC is because of load imbalance using the pure MPI paradigm.…”
Section: A Nas Parallel Benchmarks 1) Mpi Versionmentioning
confidence: 84%
See 4 more Smart Citations
“…It should be noted that performance of MPI functions in native MIC mode is 3 to 20 times worse than in native host mode as reported by Saini et al [13]. Poor scalability for BT and SP on MIC is because of load imbalance using the pure MPI paradigm.…”
Section: A Nas Parallel Benchmarks 1) Mpi Versionmentioning
confidence: 84%
“…Applications with significant amounts of MPI communication, especially collective communication, perform very poorly on MIC because the performance of MPI functions is 3 to 20 times slower for intra-MIC and 10 to 60 times slower for inter-MIC communication as compared to host [13]. To reduce MPI communication time, we performed optimization by packing and unpacking the MPI messages.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations