2013 42nd International Conference on Parallel Processing 2013
DOI: 10.1109/icpp.2013.87
|View full text |Cite
|
Sign up to set email alerts
|

Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi

Abstract: NAS parallel benchmarks (NPB) are a set of applications commonly used to evaluate parallel systems. We use the NPB-OpenMP version to examine the performance of the Intel's new Xeon Phi co-processor and focus specially on the many-core aspect of the Xeon Phi architecture. A first analysis studies the scalability up to 244 threads on 61 cores, the impact of affinity settings on scaling and compare performance characteristics of Xeon Phi and traditional Xeon CPUs. The application of several well-established optim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
26
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 36 publications
(28 citation statements)
references
References 8 publications
0
26
0
Order By: Relevance
“…Paper [6] describes some opportunities and challenges about porting to the Intel Xeon Phi. Paper [7] makes a brief performance comparison of NAS Parallel Benchmarks running on CPU and MIC separately. But this research does not combine the computation power of MIC with CPU to display higher efficiency of CPU/MIC heterogeneous node.…”
Section: Related Workmentioning
confidence: 99%
“…Paper [6] describes some opportunities and challenges about porting to the Intel Xeon Phi. Paper [7] makes a brief performance comparison of NAS Parallel Benchmarks running on CPU and MIC separately. But this research does not combine the computation power of MIC with CPU to display higher efficiency of CPU/MIC heterogeneous node.…”
Section: Related Workmentioning
confidence: 99%
“…Scientific and engineering applications have small instruction footprints, long basic blocks, and low control divergence which makes them suitable for SIMD execution. Nowadays, Intel's Xeon Phi cores [27] and Fujitsu's SPARC64 series of chips [28] implement wide vector units to exploit these code characteristics and gain performance. Our work revisits these findings considering modern HPC workloads and in the context of CMPs made out of light-weight out-of-order cores.…”
Section: Related Workmentioning
confidence: 99%
“…Because of that, ARM's Cortex-A57 cores, used in microservers, have a larger 48 KB I-cache to reduce the impact of I-cache misses [34]. An Intel Xeon Phi core has 512-bit wide vector processing unit so it can exploit the SIMD characteristics of scientific codes [27]. Our findings suggest that a similar core tailoring can be applied to lean-core CMPs used in HPC by redimensioning the existing structures based on application demands.…”
Section: Related Workmentioning
confidence: 99%
“…NAS application benchmarks of class B (medium) and C (large) are executed on Linux kernel version 2.6.32. These benchmarks feature wide variations in several execution properties, including parallelization annotations, varying sequential parts and compute-and memory-boundedness of threads [14]. The applications were initially profiled and performance annotated with execution times (Section III-A).…”
Section: A Experimental Setupmentioning
confidence: 99%