Proceedings of the 2012 International Symposium on Memory Management 2012
DOI: 10.1145/2258996.2259000
|View full text |Cite
|
Sign up to set email alerts
|

Memory management for many-core processors with software configurable locality policies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 18 publications
(6 citation statements)
references
References 28 publications
0
6
0
Order By: Relevance
“…. , , current node n, cores per node C) (2) Populate [1 : ] with bytes in T.depend list; (3) if ( ) > ( )/ and V ( ) > 0 then (4) find with least NUMA distance-weighted cost to ; (5) enqueue( , T); (6) else (7) enqueue( , T); (9) find with least home cache latency cost to ; (10) enqueue( , T); (11) else (12) enqueue( , T); (13) end (14) end (15) else (16) enqueue( , T); (17) end (18) end Algorithm 3: Work-dealing algorithm for TILEPro64. spent waiting for memory by counting dispatch stall cycles which includes load/store unit stall cycles [13].…”
Section: Potential For Performance Improvementsmentioning
confidence: 99%
See 2 more Smart Citations
“…. , , current node n, cores per node C) (2) Populate [1 : ] with bytes in T.depend list; (3) if ( ) > ( )/ and V ( ) > 0 then (4) find with least NUMA distance-weighted cost to ; (5) enqueue( , T); (6) else (7) enqueue( , T); (9) find with least home cache latency cost to ; (10) enqueue( , T); (11) else (12) enqueue( , T); (13) end (14) end (15) else (16) enqueue( , T); (17) end (18) end Algorithm 3: Work-dealing algorithm for TILEPro64. spent waiting for memory by counting dispatch stall cycles which includes load/store unit stall cycles [13].…”
Section: Potential For Performance Improvementsmentioning
confidence: 99%
“…Tousimojarad and Vanderbauwhede [33] cleverly reduce access latencies to uniformly distributed data by using copies whose home cache is local to the access thread on the TILEPro64 processor. Zhou and Demsky [2] build a NUMAaware adaptive garbage collector that migrate objects to improve locality on manycore processors. We target standard OpenMP programs written in C which makes it difficult to migrate objects.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Classified as shared, however, the mapping reverts back to being statically mapped. Recent proposals have also explored replication [50,51], coherence protocol based optimization [52,53,54] and software configurable policies [55,56], trading implementation complexity for performance. With regard to cache line placement, this paper explores runtime modification of the home node and how to support it at the software-hardware interface.…”
Section: Related Workmentioning
confidence: 99%
“…However, we still use interleaved spaces for the old and permanent generations, as these generations use a compacting algorithm. Zhou and Demsky [32] propose a NUMA-aware compaction algorithm, but this is out of our scope. Furthermore, our results show that using a fragmented space for the other generations is not required to make the garbage collector scale.…”
Section: Fragmented and Segregated Spacesmentioning
confidence: 99%