2014
DOI: 10.1109/l-ca.2012.30
|View full text |Cite
|
Sign up to set email alerts
|

Thread Migration Prediction for Distributed Shared Caches

Abstract: Abstract-Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Access (NUCA) design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2015
2015
2018
2018

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 13 publications
0
10
0
Order By: Relevance
“…Thread migration may be achieved at the user level, kernel level, or application level. (Shim, Lis, Khan, & Devadas, 2014) considered a mechanism, hardware-level thread migration. They argued that the method has the capability to better exploit of shared data locality for NUCA (Non-Uniform Cache Architecture) designs by adequately supplanting multiple round-trip remote cache accesses by fewer migrations.…”
Section: Thread Migrationmentioning
confidence: 99%
“…Thread migration may be achieved at the user level, kernel level, or application level. (Shim, Lis, Khan, & Devadas, 2014) considered a mechanism, hardware-level thread migration. They argued that the method has the capability to better exploit of shared data locality for NUCA (Non-Uniform Cache Architecture) designs by adequately supplanting multiple round-trip remote cache accesses by fewer migrations.…”
Section: Thread Migrationmentioning
confidence: 99%
“…Therefore, our migration predictor focuses on detecting those. Compared to the predictor presented in [16], which only supports full-context migration, we further reduce migration costs by sending only a part of the register file when a thread migrates (usually, only some of the registers are used between the time the thread migrates out of its native core and the time it returns). With the deadlock-free migration framework of [5], the native-core register file remains intact even if a thread migrates away, because its context it is not used by any other guest threads.…”
Section: Thread Migration Predictormentioning
confidence: 99%
“…Thread migration has also been used to provide memory coherence among per-core caches [10] using a deadlock-free finegrained thread migration protocol [5]; we adopt the same protocol for our hybrid framework. Although a migration predictor that decides between migrations and remote accesses is introduced in [16], it does not address the overhead of high network traffic for thread migration. This paper proposes a novel migration predictor that supports partial context migration, improving both performance and network traffic.…”
Section: Related Workmentioning
confidence: 99%
“…Although our ISA allows the programmer to directly specify whether the instruction should migrate or execute via remote cache access, in general this decision can be dynamic and dependent on the phase of the program; therefore, EM² relies on an automatic hardware migration predictor [19] in each tile.…”
Section: E Migration Decision Schemementioning
confidence: 99%
“…While the predictor described in [19] only supports fullcontext migration, the migration predictor of EM² further supports stack-based partial context migration. Each predictor entry consists of a tag for the PC and the transfer sizes for the main and auxiliary stacks upon migrating a thread.…”
Section: E Migration Decision Schemementioning
confidence: 99%