Loop Selection for Thread-Level Speculation

Wang, Shengyue; Dai, Xiaoru; Yellajyosula, Kiran; Zhai, Antonia; Yew, Pen-Chung

doi:10.1007/978-3-540-69330-7_20

Cited by 36 publications

(52 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The compiler also generates synchronization instructions for frequently occurred cross-iteration data dependences. Our compiler framework and the loop selection methodology are described in detail in [14]. Table 2 shows the details of benchmarks (from SPEC2000) used to evaluate our scheme.…”

Section: Experimental Methodologymentioning

confidence: 99%

See 1 more Smart Citation

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors

Packirisamy

Wang

Zhai

et al. 2006

High Performance Computing - HiPC 2006

Self Cite

View full text Add to dashboard Cite

Abstract. Speculative multithreading is a technique that has been used to improve single thread performance. Speculative multithreading architectures for Chip multiprocessors (CMPs) have been extensively studied. But there have been relatively few studies on the design of speculative multithreading for simultaneous multithreading (SMT) processors. The current SMT based designs -IMT [9] and DMT [2] use load/store queue (LSQ) to perform dependence checking. Since the size of the LSQ is limited, this design is suitable only for small threads. In this paper we present a novel cache-based architecture support for speculative simultaneous multithreading which can efficiently handle larger threads. In our architecture, the associativity in the cache is used to buffer speculative values. Our 4-thread architecture can achieve about 15% speedup when compared to the equivalent superscalar processors and about 3% speedup on the average over the LSQ-based architectures, however, with a less complex hardware. Also our scheme can perform 14% better than the LSQ-based scheme for larger threads.

show abstract

Section: Experimental Methodologymentioning

confidence: 99%

“…Due to this consideration, LSQ based architectures can support only small threads. But our research [14] shows that if we need to consider a more realistic overhead of forking a thread, it becomes more difficult to justify at small granularities. Hence, it is important to support larger threads.…”

Section: Introductionmentioning

confidence: 99%

Supporting Speculative Multithreading on Simultaneous Multithreaded Processors

Packirisamy

Wang

Zhai

et al. 2006

High Performance Computing - HiPC 2006

Self Cite

View full text Add to dashboard Cite

show abstract

“…Whaley and Kozyrakis [13] proposed three classes of heuristics for method-level speculation, and found that single-pass heuristics lead to best speedups while simple/complex multi-pass heuristics tend to over/under speculation. Wang et al [12] constructed a loop-graph and used it for global loop selection to maximize program performance. Liu et al [8] proposed an online-profiling approach to speculatively parallelize candidate loops.…”

Section: Related Workmentioning

confidence: 99%

Adaptive Fork-Heuristics for Software Thread-Level Speculation

Cao

Verbrugge

2014

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Fork-heuristics play a key role in software Thread-Level Speculation (TLS). Current fork-heuristics either lack real parallel execution environment information to accurately evaluate fork points and/or focus on hardware-TLS implementation which cannot be directly applied to software TLS. This paper proposes adaptive fork-heuristics as well as a feedback-based selection technique to overcome the problems. Adaptive fork-heuristics insert and speculate on all potential fork/join points and purely rely on the runtime system to disable inappropriate ones. Feedback-based selection produces parallelized programs with ideal speedups using log files generated by adaptive heuristics. Experiments of three scientific computing benchmarks on a 64-core machine show that feedback-based selection and adaptive heuristics achieve more than 88% and 50% speedups of the manual-parallel version, respectively. For the Barnes-Hut benchmark, feedback-based selection is 49% faster than the manual-parallel version.

show abstract

“…To the best of our knowledge, existing loop-oriented techniques [3,10,11,13,14,15] form threads only at loop boundaries by turning loop iterations into threads. In [10,14,15], frequently occurring dependences are synchronized.…”

Section: Related Workmentioning

confidence: 99%

“…If we turn loop iterations directly into speculative threads as in existing looporiented compiler techniques [3,10,11,13,14,15], some inter-iteration dependences in the loop, which become inter-thread dependences at run time, can be too costly to enforce. Furthermore, value prediction may not be effective for irregular loops accessing arrays with…”

Section: Introductionmentioning

confidence: 99%