HDTrans

Sridhar, Swaroop; Shapiro, Jonathan S.; Bungale, Prashanth P.

doi:10.1145/1241601.1241602

Cited by 21 publications

(3 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…DynamoRIO [8], Pin [19], Valgrind [20], HDTrans [23] and fastBT [21] are DBTs designed for program instrumentation which have the same host and source architectures. The last two do not perform register state recovery when translating signals, and thus may cause some applications to malfunction.…”

Section: Related Workmentioning

confidence: 99%

Low overhead dynamic binary translation on ARM

et al. 2017

View full text Add to dashboard Cite

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity.We present MAMBO-X64, a dynamic binary translator for Linux which executes 32-bit ARM binaries using only the AArch64 instruction set. We have evaluated the performance of MAMBO-X64 on three existing ARMv8 processors which support both AArch32 and AArch64 instruction sets. The performance was measured by comparing the running time of 32-bit benchmarks running under MAMBO-X64 with the same benchmark running natively. On SPEC CPU2006, we achieve a geometric mean overhead of less than 7.5 % on in-order Cortex-A53 processors and a performance improvement of 1 % on out-of-order X-Gene 1 processors.MAMBO-X64 achieves such low overhead by novel optimizations to map AArch32 floating-point registers to AArch64 registers dynamically, handle overflowing address calculations efficiently, generate traces that harness hardware return address prediction, and handle operating system signals accurately.

show abstract

Section: Related Workmentioning

confidence: 99%

Low overhead dynamic binary translation on ARM

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Although there is a large number of publications that quantify the overhead in DBT architectures [20,50,22,10,26] the authors do not classify the source of this overhead. Other studies focus on one specific overhead: interpretation overhead [50,68], branch resolution [70], indirect branches overhead [32] etc. Our results are more detailed as they are not limited to the global overhead but show the breakdown.…”

Section: Related Workmentioning

confidence: 99%

Performance simulation methodologies for hardware/software co-designed processors

Brankovic¹

View full text Add to dashboard Cite

Recently the community started looking into Hardware/Software (HW/SW) co-designed processors as potential solutions to move towards the less power consuming and the less complex designs. Unlike other solutions, they reduce the power and the complexity doing so called dynamic binary translation and optimization from a guest ISA to an internal host custom ISA. This thesis tries to answer the question on how to simulate this kind of architectures. For any kind of processor's architecture, the simulation is the common practice, because it is impossible to build several versions of hardware in order to try all alternatives. The simulation of HW/SW co-designed processors has a big issue in comparison with the simulation of traditional HW-only architectures. First of all, open source tools do not exist. Therefore researches many times assume that the software layer overhead, which is in charge for dynamic binary translation and optimization, is constant or ignored. In this thesis we show that such an assumption is not valid and that can lead to very inaccurate results. Therefore including the software layer in the simulation is a must. On the other side, the simulation is very slow in comparison to native execution, so the community spent a big effort on delivering accurate results in a reasonable amount of time. Therefore it is the common practice for HW-only processors that only parts of application stream, which are called samples, are simulated. Samples usually correspond to different phases in the application stream and usually they are no longer than a few million of instructions. In order to archive accurate starting state of each sample, microarchitectural structures are warmed-up for a few million instructions prior to samples instructions. Unfortunately, such a methodology cannot be directly applied for HW/SW co-designed processors. The warm-up for HW/SW co-designed processors needs to be 3-4 orders of magnitude longer than the warm-up needed for traditional HW-only processor, because the warm-up of software layer needs to be longer than the warm-up of hardware structures. To overcome such a problem, in this thesis we propose a novel warm-up technique specialized for HW/SW co-designed processors. Our solution reduces the simulation time by at least 65X with an average error of just 0.75\%. Such a trend is visible for different software and hardware configurations. The process used to determine simulation samples cannot be applied to HW/SW co-designed processors as well, because due to the software layer, samples show more dissimilarities than in the case of HW-only processors. Therefore we propose a novel algorithm that needs 3X less number of samples to achieve similar error like the state of the art algorithms. Again, such a trend is visible for different software and hardware configurations. Els processadors co-dissenyats Hardware/Software (HW/SW co-designed processors) han estat proposats per l'acadèmia i la indústria com a solucions potencials per a fabricar processadors menys complexos i que consumeixen menys energia. A diferència d'altres alternatives, aquest tipus de processadors redueixen la complexitat i el consum d'energia aplicant traducció y optimització dinàmica de binaris des d'un repertori d'instruccions (instruction set architecture) extern cap a un repertori d'instruccions intern adaptat. Aquesta tesi intenta resoldre els reptes relacionats a la simulació d'aquest tipus d'arquitectures. La simulació és un procés comú en el disseny i desenvolupament de processadors ja que permet explorar diverses alternatives sense haver de fabricar el hardware per a cadascuna d'elles. La simulació de processadors co-dissenyats Hardware/Software és un procés més complex que la simulació de processadores tradicionals, purament hardware. Per exemple, no existeixen eines de simulació disponibles per a la comunitat. Per tant, els investigadors acostumen a assumir que la capa de software, que s'encarrega de la traducció i optimització de les aplicacions, no té un pes específic i, per tant, uns costos computacionals baixos o constants en el millor dels casos. En aquesta tesis demostrem que aquestes premisses són incorrectes i que els resultats amb aquestes acostumen a ser molt imprecisos. Una primera conclusió d'aquesta tesi doncs és que la simulació de la capa software és totalment necessària. A més a més, degut a que els processos de simulació són lents, s'han proposat tècniques de simulació que intenten obtenir resultats precisos en el menor temps possible. Una pràctica habitual és la simulació només de parts de les aplicacions, anomenades mostres, en el disseny de processadors convencionals, purament hardware. Aquestes mostres corresponen a diferents fases de les aplicacions i acostumen a ser de pocs milions d'instruccions. Per tal d'aconseguir un estat microarquitectònic acurat per a cadascuna de les mostres, s'acostumen a estressar aquestes estructures microarquitectòniques del simulador abans de començar a extreure resultats, procés anomenat "escalfament" (warm-up). Desafortunadament, aquesta metodologia no pot ser aplicada a processadors co-dissenyats Hardware/Software. L'"escalfament" de les estructures internes del simulador en el disseny de processadores co-dissenyats Hardware/Software són 3-4 ordres de magnitud més gran que el mateix procés d' "escalfament" en simulacions de processadors convencionals, ja que en els primers cal "escalfar" també les estructures i l'estat de la capa software. En aquesta tesi proposem tècniques de simulació basades en l' "escalfament" de les estructures que redueixen el temps de simulació en 65X amb un error mig del 0,75%. Aquests resultats són extrapolables a diferents configuracions del hardware i de la capa software. Finalment, les tècniques convencionals de selecció de mostres d'aplicacions a simular no són aplicables tampoc a la simulació de processadors co-dissenyats Hardware/Software degut a que les mostres es comporten de manera molt diferent quan es té en compte la capa software. En aquesta tesi, proposem un nou algorisme que redueix 3X el nombre de mostres a simular comparat amb els algorismes tradicionals per a processadors convencionals per a obtenir un error similar. Aquests resultats també són extrapolables a diferents configuracions de hardware i de software. En conclusió, en aquesta tesi es respon al repte de com simular processadors co-dissenyats Hardware/Software, que són una alternativa al disseny tradicional de processadors. Hem demostrat que cal simular la capa software i s'han proposat noves tècniques i algorismes eficients d' "escalfament" i selecció de mostres que són tolerants a diferents configuracions

show abstract

“…Sridhar et al [35,36] describe HDTrans, a dynamic binary translator that is performanceefficient, although it does not employ any code optimization technique. One of the reasons for its efficiency is the introduction of two novel indirect branch emulation techniques: the Sieve, for register-indirect jumps, and the Return Cache for return instructions (see Section 4.6 and Section 4.9).…”

Section: Ibtcmentioning

confidence: 99%

Indirect branch emulation techniques in virtual machines

Gomes¹

View full text Add to dashboard Cite

Dynamic binary translation is an emulation technique commonly employed in the implementation of virtual machines. One of the main sources of overhead that hinder the applicability of dynamic binary translators is that caused by the emulation of indirect branch instructions. This master thesis describes several techniques that try to improve the performance and efficiency of indirect branch emulation in efficient virtual machines. DynamoRIO is one of such machines and it implements features used by several of those techniques. In this master thesis, we present current implementations of DynamoRIO, modify its code to include two new techniques (Inline Caching and IBTC) and compare it with other techniques described in the literature. ix ResumoTradução dinâmica de binários é uma técnica de emulação comumente utilizada na implementação de máquinas virtuais. Neste contexto, a emulação de saltos indiretos é uma das principais fontes de perda de eficiência, o que atrapalha a aplicabilidade de tradutores dinâmicos de binários. Essa dissertação descreve diversas técnicas que tentam melhorar o desempenho e a eficiência da emulação de saltos indiretos em máquinas virtuais eficientes. O DynamoRIO é uma máquina virtual que se enquadra nessa categoria e que utiliza características de diversas dessas técnicas. Nessa dissertação, nós apresentamos a implementação atual do DynamoRIO, modificamos seu código para incluir duas novas técnicas de emulação de saltos indiretos (Inline Caching e IBTC) e as comparamos com outras técnicas descritas na literatura.xi

show abstract

HDTrans

Cited by 21 publications

References 5 publications

Low overhead dynamic binary translation on ARM

Low overhead dynamic binary translation on ARM

Performance simulation methodologies for hardware/software co-designed processors

Indirect branch emulation techniques in virtual machines

Contact Info

Product

Resources

About