Proceedings of the 4th Annual International Conference on Systems and Storage 2011
DOI: 10.1145/1987816.1987832
|View full text |Cite
|
Sign up to set email alerts
|

Memory system performance in a NUMA multicore multiprocessor

Abstract: Modern multicore processors with an on-chip memory controller form the base for NUMA (non-uniform memory architecture) multiprocessors. Each processor accesses part of the physical memory directly and has access to the other parts via the memory controller of other processors. These other processors are reached via the cross-processor interconnect. As a consequence a processor's memory controller must satisfy two kinds of requests: those that are generated by the local cores and those that arrive via the inter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
42
0

Year Published

2013
2013
2021
2021

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 70 publications
(43 citation statements)
references
References 25 publications
1
42
0
Order By: Relevance
“…In [15] the authors evaluate the memory performance of NUMA machines. One of the main findings is how guaranteeing data locality to sockets need not be optimal always, due to increased pressure on local memory bandwidth.…”
Section: Related Workmentioning
confidence: 99%
“…In [15] the authors evaluate the memory performance of NUMA machines. One of the main findings is how guaranteeing data locality to sockets need not be optimal always, due to increased pressure on local memory bandwidth.…”
Section: Related Workmentioning
confidence: 99%
“…Majo and Gross investigated the NUMA-memory contention problem and developed a model to characterize the sharing of local and remote memory bandwidth [15]. Fedorova et al designed a contention-aware algorithm Carrefour to manage memory traffic congestion in the Linux OS [11].…”
Section: Related Workmentioning
confidence: 99%
“…Recent work shows that contentions on the hardware prefetcher [25], the memory controller [27,30] and the DRAM bus [11] can also cause significant performance slowdown in both UMA and NUMA systems. Last-level cache miss rate has been widely used as a proxy for the contention on shared resources [7,8,9,14,26] and the similarity in thread address spaces has been used to quantify the inter-thread sharing activity [5,35,38].…”
Section: Optimization Via Schedulingmentioning
confidence: 99%