Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering 2022
DOI: 10.1145/3489525.3511689
|View full text |Cite
|
Sign up to set email alerts
|

Memory Performance of AMD EPYC Rome and Intel Cascade Lake SP Server Processors

Abstract: Modern processors, in particular within the server segment, integrate more cores with each generation. This increases their complexity in general, and that of the memory hierarchy in particular. Software executed on such processors can suffer from performance degradation when data is distributed disadvantageously over the available resources. To optimize data placement and access patterns, an in-depth analysis of the processor design and its implications for performance is necessary. This paper describes and e… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 20 publications
0
3
0
Order By: Relevance
“…Further complicating matters, shared cache bandwidth is a complex topic since it is often tied to core and fabric clockspeeds. These and other aspects are studied in further detail by [56,57]. Once again, this is not the whole picture.…”
Section: Fair Cpu and Gpu Comparisonsmentioning
confidence: 99%
“…Further complicating matters, shared cache bandwidth is a complex topic since it is often tied to core and fabric clockspeeds. These and other aspects are studied in further detail by [56,57]. Once again, this is not the whole picture.…”
Section: Fair Cpu and Gpu Comparisonsmentioning
confidence: 99%
“…Additionally, the TLB latency overlaps with the latency for the L1 cache as the cache is virtually indexed and physically tagged, which requires a TLB lookup in parallel. Velten et al [VSIH22] benchmarked the AMD EPYC 7702, which uses a 7nm process, and the Intel Xeon Gold 6248, which is manufactured in 14nm. The results show that the latency for the L1 cache is between 1.6ns and 2ns.…”
Section: Hardware Requirementsmentioning
confidence: 99%
“…5 in [28] the size of shared memory) of the SM. Although GPUs provide high memory bandwidth, the global memory access latency is also higher than that of CPUs [32,33,34]. Therefore, optimal throughput may be attained by covering memory requests with computational execution and hiding the latency of data movement.…”
Section: Gpu Architecturementioning
confidence: 99%