The shared-thread multiprocessor

Brown, Jeffery A.; Tullsen, Dean M.

doi:10.1145/1375527.1375541

Cited by 24 publications

(15 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous work [3,38] describes support mechanisms for migrating register state in order to decrease the latency of thread activation and deactivation; however, performance subsequent to migration still suffers due to cold-cache effects. Our work is complimentary; we specifically address the post-migration cache misses which limit the gains of those techniques.…”

Section: Background and Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Fast thread migration via cache working set prediction

Brown

Porter

Tullsen

2011

2011 IEEE 17th International Symposium on High Performance Computer Architecture

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Background and Related Workmentioning

confidence: 99%

“…The cores of our CMP feature hardware support for thread activation and deactivation, as found in prior studies of thread scheduling [3,38]. While those works used hardware support to implement scheduling and time-sharing policies, we use it simply for adding and removing threads from cores.…”

Section: Baseline Multicore Architecturementioning

confidence: 99%

Fast thread migration via cache working set prediction

Brown

Porter

Tullsen

2011

2011 IEEE 17th International Symposium on High Performance Computer Architecture

Self Cite

View full text Add to dashboard Cite

show abstract

“…Pseudo-parallelism share the some technical issues in common, related to the need for synchronisation between running programs. Figure 1 is somewhat detailed view of the running of the four programs-labelled P1, P2, P3, and P4-in multiprogramming mode [5]. The top part of the figure, all the four programs seems to be running in parallel.…”

Section: Baseline Architecturementioning

confidence: 99%

An Attempt to Improve the Processor Performance by Proper Memory Management for Branch Handling

Abraham¹,

Mathew²

2013

IJCSEA

View full text Add to dashboard Cite

show abstract

“…If there is data in cache on the user processor that must be accessed by the OS core, it must be transferred to the OS core (automatically handled by the coherence mechanism). The aggressive scheme is based on the technique proposed by Brown and Tullsen [9] and is assumed to incur a 100 cycle migration latency. They advocate hardware support for book-keeping and thread scheduling (normally done in software by an OS or virtual machine).…”

Section: Background and Motivationmentioning

confidence: 99%

“…The work by Brown and Tullsen [9], for example, attempts to design a low-latency process migration mechanism that is an important technology for OS off-load. Similarly, in this paper we assume that OS off-load is a promising approach and we attempt to resolve another component of OS off-load that may be essential for its eventual success, viz, the decision-making process that determines which operations should be off-loaded.…”

Section: Introductionmentioning

confidence: 99%

Improving Server Performance on Multi-cores via Selective Off-Loading of OS Functionality

Nellans

Sudan

Brunvand

et al. 2011

Computer Architecture

View full text Add to dashboard Cite

Abstract. Modern and future server-class processors will incorporate many cores. Some studies have suggested that it may be worthwhile to dedicate some of the many cores for specific tasks such as operating system execution. OS off-loading has two main benefits: improved performance due to better cache utilization and improved power efficiency due to smarter use of heterogeneous cores. However, OS off-loading is a complex process that involves balancing the overheads of off-loading against the potential benefit, which is unknown while making the offloading decision. In prior work, OS off-loading has been implemented by first profiling system call behavior and then manually instrumenting some OS routines (out of hundreds) to support off-loading. We propose a hardware-based mechanism to help automate the off-load decisionmaking process, and provide high quality dynamic decisions via performance feedback. Our mechanism dynamically estimates the off-load requirements of the application and relies on a run-length predictor for the upcoming OS system call invocation. The resulting hardware based off-loading policy yields a throughput improvement of up to 18% over a baseline without off-loading, 13% over a static software based policy, and 23% over a dynamic software based policy.

show abstract

The shared-thread multiprocessor

Cited by 24 publications

References 20 publications

Fast thread migration via cache working set prediction

Fast thread migration via cache working set prediction

An Attempt to Improve the Processor Performance by Proper Memory Management for Branch Handling

Improving Server Performance on Multi-cores via Selective Off-Loading of OS Functionality

Contact Info

Product

Resources

About