Threaded multiple path execution

Wallace, Steven; Calder, Brad; Tullsen, Dean M.

doi:10.1145/279361.279392

Cited by 29 publications

(33 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have used a modified ICOUNT policy which is related to a modification presented by Wallace et al for Threaded Multiple Path Execution (TME) [18]. In TME, the alternate paths of hard-to-predict branches are executed in free thread contexts on an SMT.…”

Section: Smt Thread Prioritymentioning

confidence: 99%

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor

Warg

Stenström

2008

Int J Parallel Prog

View full text Add to dashboard Cite

As chip multiprocessors with simultaneous multithreaded cores are becoming commonplace, there is a need for simple approaches to exploit threadlevel parallelism. In this paper, we consider thread-level speculation as a means to reap thread-level parallelism out of application binaries. We first investigate the tradeoffs between scheduling speculative threads on the same core and on different cores. While threads contend for the same resources using the former approach, the latter approach is plagued by the overhead for inter-core communication. Despite the impact of resource contention, our detailed simulations show that the first approach provides the best performance due to lower inter-thread communication cost. The key contribution of the paper is the proposed design and evaluation of the dual-thread speculation system. This design point has very low complexity and reaps most of the gains of a system.

show abstract

Section: Smt Thread Prioritymentioning

confidence: 99%

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor

Warg

Stenström

2008

Int J Parallel Prog

View full text Add to dashboard Cite

show abstract

“…Examples include the multiscalar architecture [18], threaded multiple path execution [20], threadlevel data speculation [19], speculative data-driven multithreading [14], and slipstream processors [12]. Even though the idea to reuse already computed results sounds appealing, it introduces additional hardware complexity and increases the design and verification costs.…”

Section: Related Workmentioning

confidence: 99%

Future execution: a hardware prefetching technique for chip multiprocessors

Ganusov

Burtscher

2005

14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)

View full text Add to dashboard Cite

This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-control instructions in the prefetching core after they have executed in the primary core. On the way to the second core, each instruction's output is replaced by a prediction of the likely output that the n th future instance of this instruction will produce. Speculatively executing the resulting instruction stream on the second core issues load requests that the main program will probably reference in the future. Unlike previously proposed thread-based prefetching approaches, our technique does not need any thread spawning points, features an adjustable lookahead distance, does not require complicated analyzers to extract prefetching threads, is recovery-free, and necessitates no storage for the prefetching threads. We demonstrate that for the SPECcpu2000 benchmark suite, our mechanism significantly increases the prefetching coverage and improves the primary core's performance by 10% on average over a baseline that already includes an aggressive hardware stream prefetcher. We further show that our approach works well in combination with runahead execution.

show abstract

“…First, we describe the TME proposal, [9]. Then we describe our simplified experiments and present the results combined with selective cluster sharing.…”

Section: Evaluation Of Multi-path Executionmentioning

confidence: 99%

“…a1 propose TME, [9], which consists of using the spare contexts of an SMT processor to speculatively execute along the less likely path of hard-to-predict branches. They consider executing multiple alternate paths of more than one primary thread, using confidence estimation to select which branches to fork.…”

Section: Threaded Multi-path Execution (Tme)mentioning

confidence: 99%

“…In this paper we propose and analyze several mechanisms and policies to exploit the idle fetch clusters on a M T processor to improve singlethread execution without any multi-thread performance loss. We analyze three simultaneous alternatives: (1) allowing one thread to use the instruction cache and branch prediction storage of its neighbors (selective clusfer sharing), (2) providing multiple-path execution (Wallace et al, [9]), or (3) widenning the effective singlethread fetch block (fetch block widening). Section 2 describes the baseline, clustered M T frontend and performs a preliminary analysis of the effects on single-thread performance of adding clustered, multithread capability to a single-thread processor.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving single-thread fetch performance on a multithreaded processor

Moure

García²,

Rexachs³

et al.

Proceedings Euromicro Symposium on Digital Systems Design

View full text Add to dashboard Cite

Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rates than single-thread processors. On a multithread workload, a clustered organization maximizes performance. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly.Using a clustered multithreaded processor optimized for multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most peiformance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-thread fetch block.

show abstract

Threaded multiple path execution

Abstract: This paper presents Threaded Multi-Path Execution (TME), which exploits existing hardware on a Simultaneous Multithreading (SMT)

Cited by 29 publications

References 9 publications

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor

Future execution: a hardware prefetching technique for chip multiprocessors

Improving single-thread fetch performance on a multithreaded processor

Contact Info

Product

Resources

About