2017
DOI: 10.1007/978-3-319-58667-0_16
|View full text |Cite
|
Sign up to set email alerts
|

An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors

Abstract: Abstract. This paper presents a survey of architectural features among four generations of Intel server processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance with floating point workloads. Starting on the core level and going down the memory hierarchy we cover instruction throughput for floating-point instructions, L1 cache, address generation capabilities, core clock speed and its limitations, L2 and L3 cache bandwidth and latency, the impact of Cluster on Die (CoD) and cac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
24
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
3

Relationship

2
4

Authors

Journals

citations
Cited by 17 publications
(25 citation statements)
references
References 19 publications
1
24
0
Order By: Relevance
“…Triad kernel from the STREAM benchmark was run in a tight loop on each core simultaneously, with problem sizes selected to ensure residency ## https://ark.intel.com/ This portable methodology was previously shown to attain the same performance as handwritten benchmarks, which only work on their target architectures. 18 We evaluated three different compiler families for ThunderX2 in this study: GCC 7 and 8, the LLVM-based Arm HPC Compiler 18.2 and 18.3, and Cray's CCE 8.6 and 8.7. We believe this is the first study to date that has compared all three of these compilers targeting Arm.…”
Section: Platformsmentioning
confidence: 99%
“…Triad kernel from the STREAM benchmark was run in a tight loop on each core simultaneously, with problem sizes selected to ensure residency ## https://ark.intel.com/ This portable methodology was previously shown to attain the same performance as handwritten benchmarks, which only work on their target architectures. 18 We evaluated three different compiler families for ThunderX2 in this study: GCC 7 and 8, the LLVM-based Arm HPC Compiler 18.2 and 18.3, and Cray's CCE 8.6 and 8.7. We believe this is the first study to date that has compared all three of these compilers targeting Arm.…”
Section: Platformsmentioning
confidence: 99%
“…It is well known that manufacturing variations cause significant fluctuation across chips of the same type in terms of power dissipation [7,6]. This poses problems, e.g., when power capping is enforced because power variations then translate into performance variations [7], but it can also be leveraged for saving energy by intelligent scheduling [12].…”
Section: Energy Model and Validationmentioning
confidence: 99%
“…In summary, our power model yields meaningful estimates of high quality with an error below 1% for relevant operating points (i.e., away from saturation and using more than a single core). In contrast to the work in [6], where the power/performance behavior was only observed empirically, we have presented an analytic model based on simplifying assumptions that can accurately describe the observed behavior. Fig.…”
Section: Energy Model and Validationmentioning
confidence: 99%
“…The bandwidth is then modeled using the array size, number of iterations, and the time for the benchmark to run. This portable methodology was previously shown to attain the same performance as handwritten benchmarks, which only work on their target architectures …”
Section: Benchmarking Resultsmentioning
confidence: 99%