Optimizations and oracle parallelism with dynamic translation

Ebcioğlu, Kemal; Altman, Erik R.; Sathaye, Sumedh W.; Gschwind, Michael

doi:10.1109/micro.1999.809466

Cited by 12 publications

(11 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Prior work has studied the limits of instruction-level parallelism under several idealizations, including a large or infinite instruction window, perfect branch prediction and memory disambiguation, and simple program transformations to remove unnecessary data dependences [4,9,18,20,24,42,49,57,74]. Similar to our limit study, these analyses find that parallelism is often plentiful (>1000×), but very large instruction windows are needed to exploit it (>100K instructions [42,57]).…”

Section: Additional Related Worksupporting

confidence: 61%

A scalable architecture for ordered parallelism

Jeffrey¹,

Subramanian²,

Yan³

et al. 2015

Proceedings of the 48th International Symposium on Microarchitecture

View full text Add to dashboard Cite

We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits.We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51-122× speedups over a single-core system, and outperforms software-only parallel algorithms by 3-18×.

show abstract

Section: Additional Related Worksupporting

confidence: 61%

A scalable architecture for ordered parallelism

Jeffrey¹,

Subramanian²,

Yan³

et al. 2015

Proceedings of the 48th International Symposium on Microarchitecture

View full text Add to dashboard Cite

show abstract

“…As each VLIW tree region is translated, a number of optimizations are performed to enhance the available instruction parallelism. These include expansion of register-indirect branches into a series of conditional branches to increase scheduling opportunities [7], copy propagation, combining, load/store telescoping, and unification [8]. Speculation is used aggressively within a translation group, although resuits are committed in-order to the architected processor state to maintain precise exception behavior.…”

Section: Binary Translation Approachmentioning

confidence: 99%

“…DAISY/390 uses an alternative approach: instead of adding a guarding test to each translation unit, we use incremental dataflow analysis between blocks to minimize the need for code which checks compilation assumptions about the contents of base registers. When a block is translated, dataflow information for the current code block is generated for code optimization techniques performed at the code block level [8]. This includes information such as the constant propagation.…”

Section: Resolving Branch Target Addressesmentioning

confidence: 99%

Binary translation and architecture convergence issues for IBM system/390

Gschwind

Ebcioğlu

Altman

et al. 2000

Proceedings of the 14th International Conference on Supercomputing

View full text Add to dashboard Cite

We describe the design issues in an implementation of the ESA/390 architecture based on binary translation to a very long instruction word (VLIW) processor. During binary translation, complex ESA/390 instructions are decomposed into instruction "primitives" which are then scheduled onto a wide-issue machine. The aim is to achieve high instruction level parallelism due to the increased scheduling and optimization opportunities which can be exploited by binary translation software, combined with the efficiency of long instruction word architectures. A further aim is to study the feasibility of a common execution platform for different instruction set architectures, such as ESA/390, RS/6000, AS/400 and the Java Virtual Machine, so that multiple systems can be built around a common execution platform.

show abstract

“…Finally, dynamic optimizers can perform profitable optimizations such as partial inlining of functions and conditional branch elimination that would be too expensive to perform statically. SDT systems that perform dynamic optimization include Dynamo, (5) DBT, (6) and Voss and Eigenmann's remote dynamic program Optimization system. (7) Some of the binary translators previously described also perform some dynamic optimization (e,g., DAISY, FX!32, and Transmeta's Code Morphing).…”

Section: Introductionmentioning

confidence: 99%

Compile-Time Planning for Overhead Reduction in Software Dynamic Translators

Kumar

Childers

Williams

et al. 2005

Int J Parallel Prog

View full text Add to dashboard Cite

Software dynamic translation (SDT) is a technology for modifying programs as they are running. The overhead of monitoring and modifying a running program's instructions is often substantial in SDT systems. As a result, SDT can be impractically slow, especially in SDT systems that do not or can not employ dynamic optimization to offset overhead. This is unfortunate since SDT has many advantages in modern computing environments and interesting uses of SDT continue to emerge. In this paper, we describe techniques to reduce the overhead of SDT. In particular, we present a compile-time planning technique to reduce the overhead due to indirect branch handling. Our results show that this technique is very effective and can improve SDT performance by up to 36%, with an average of 20%.

show abstract

Optimizations and oracle parallelism with dynamic translation

Cited by 12 publications

References 17 publications

A scalable architecture for ordered parallelism

A scalable architecture for ordered parallelism

Binary translation and architecture convergence issues for IBM system/390

Compile-Time Planning for Overhead Reduction in Software Dynamic Translators

Contact Info

Product

Resources

About