Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture 2009
DOI: 10.1145/1669112.1669165
|View full text |Cite
|
Sign up to set email alerts
|

Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

Abstract: Across a broad range of applications, multicore technology is the most important factor that drives today's microprocessor performance improvements. Closely coupled is a growing complexity of the memory subsystems with several cache levels that need to be exploited efficiently to gain optimal application performance. Many important implementation details of these memory subsystems are undocumented. We therefore present a set of sophisticated benchmarks for latency and bandwidth measurements to arbitrary locati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
54
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 97 publications
(56 citation statements)
references
References 5 publications
2
54
0
Order By: Relevance
“…As multicore, multichip servers are becoming widely used, especially as the number of processor packages increases, it is becoming necessary to revisit the impact of NUMA on the modern CMPs for some emerging workloads. Some recent work has measured NUMA-related performance in the state-ofthe-art multicores using carefully designed synthetic benchmarks [11,26]. On the other hand, there is a wealth of research related to alleviating contention in memory subsystems including cache and bandwidth on current multicores [7,10,15,25,29,[32][33][34][35].…”
Section: Related Workmentioning
confidence: 99%
“…As multicore, multichip servers are becoming widely used, especially as the number of processor packages increases, it is becoming necessary to revisit the impact of NUMA on the modern CMPs for some emerging workloads. Some recent work has measured NUMA-related performance in the state-ofthe-art multicores using carefully designed synthetic benchmarks [11,26]. On the other hand, there is a wealth of research related to alleviating contention in memory subsystems including cache and bandwidth on current multicores [7,10,15,25,29,[32][33][34][35].…”
Section: Related Workmentioning
confidence: 99%
“…For each data point, the two threads execute in lock step as shown in Figure 5 (similar measurements have been used in existing systems research [18,30,40,73]). Thread y brings the data in a modified state in its local caches and then thread x measures the latency of its own access to the shared data using the timestamp counter of the core [4].…”
Section: Context-to-context Latenciesmentioning
confidence: 99%
“…Peng et al [33][34] analyze the memory hierarchy of early dual-core processors from Intel and AMD and demonstrate their respective characteristics. In [28], Hackenberg et al conduct a comprehensive investigation on the cache structures on advanced quad-core multiprocessors. In recent years, comparison between general purpose GPUs is becoming a promising topic.…”
Section: Related Workmentioning
confidence: 99%