A predecoding technique for ILP exploitation in Java processors

Sideris, Isidoros; Pekmestzi, Kiamal; Economakos, George

doi:10.1016/j.sysarc.2008.01.008

Cited by 4 publications

(5 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…More precisely, its delay accounted for about 33.8% of the critical path delay, approximately 49.4% of which was consumed in the logic to identify the source register operands. Also, we found that we could increase the operating frequency by up to 16.6% if we remove the dependence logic from the critical path.…”

Section: Introductionmentioning

confidence: 89%

See 1 more Smart Citation

Reducing the Delay for Decoding Instructions by Predicting Their Source Register Operands

et al. 2020

View full text Add to dashboard Cite

The fetched instructions would have data dependency with in-flight ones in the pipeline execution of a processor, so the dependency prevents the processor from executing the incoming instructions for guaranteeing the program’s correctness. The register and memory dependencies are detected in the decode and memory stages, respectively. In a small embedded processor that supports as many ISAsas possible to reduce code size, the instruction decoding to identify register usage with the dependence check generally results in long delay and sometimes a critical path in its implementation. For reducing the delay, this paper proposes two methods—One method assumes the widely used source register operand bit-fields without fully decoding the instructions. However, this assumption would cause additional stalls due to the incorrect prediction; thus, it would degrade the performance. To solve this problem, as the other method, we adopt a table-based way to store the dependence history and later use this information for more precisely predicting the dependency. We applied our methods to the commercial EISC embedded processor with the Samsung 65nm process; thus, we reduced the critical path delay and increased its maximum operating frequency by 12.5% and achieved an average 11.4% speed-up in the execution time of the EEMBC applications. We also improved the static, dynamic power consumption, and EDP by 7.2%, 8.5%, and 13.6%, respectively, despite the implementation area overhead of 2.5%.

show abstract

Section: Introductionmentioning

confidence: 89%

“…Another use of the pre-decoding is for avoiding the repeated decoding the same instruction. The authors of Reference [16] proposed a hardware folding technique that dynamically transforms Java bytecodes groups into RISC instructions, storing them in a cache to enable reuse.…”

Section: Related Workmentioning

confidence: 99%

Reducing the Delay for Decoding Instructions by Predicting Their Source Register Operands

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The peripheral performs static translation per superblock, using a modification [5] of the OPEX algorithm [3]. The core folding algorithm maintains a queue of recently fetched Java bytecodes and it tries to find folding groups, even if they are nested.…”

Section: Translation Techniquementioning

confidence: 99%

“…The hardware peripheral performs stack folding of Java bytecode sequences based on OPEX [3], a recursive stack folding algorithm, which translates bytecodes to RISC instructions. The acceleration hardware is placed in the processor bus and it is managed as a peripheral, in contrast with other solutions which put the hardware between the instruction cache and the processor [5]. This placement is more flexible and up to a certain point independent of the specific microarchitecture.…”

Section: Introductionmentioning

confidence: 99%

A hardware peripheral for Java bytecodes translation acceleration

Sideris

Moshopoulos

Pekmestzi

2010

Proceedings of the 2010 ACM Symposium on Applied Computing

Self Cite

View full text Add to dashboard Cite

Java has gained great popularity in a wide range of applications. Just-in-time compilation is crucial for providing efficient execution of Java programs, but it is generally a hard task, not suited for embedded systems. This paper presents a hardware acceleration unit for efficient execution of JIT translation in embedded SoCs. The translation is limited to only first level optimizations, which include translation of Java bytecodes to native RISC instructions (stack folding). For experimentation, we developed a SoC with ARM7TDMI processor. In a 180nm ASIC technology and 80MHz clock, we measured a speed up of up to 4 times over the software only JIT translation.

show abstract

“…Η ομαδοποίηση αυτή είναι βασισμένη στον αλγόριθμο OPEX [52]. Το περιφερειακό επιτάχυνσης τοποθετείται στο διάδρομο του επεξεργαστή και ελέγχεται απ´ αυτόν, εν αντιθέσει με άλλες λύσεις όπου το υλικό μετάφρασης παρεμβάλλεται μεταξύ κρυφής μνήμης εντολών και επεξεργαστή [88,96]. Η τοποθέτηση αυτή είναι πιο ευέλικτή και μέχρι ενός βαθμού ανεξάρτητη της συγκεκριμένης μικροαρχιτεκτονικής.…”

Section: εισαγωγήunclassified

Σχεδίαση και αποδοτική υλοποίηση της εικονικής μηχανής Java για πολυμεσικές εφαρμογές

Σιδερής¹

View full text Add to dashboard Cite

Απαγορεύεται η αντιγραφή, αποθήκευση και διανομή της παρούσας εργασίας, εξ ολοκλήρου ή τμήματος αυτής, για εμπορικό σκοπό. Επιτρέπεται η ανατύπωση, αποθήκευση και διανομή για σκοπό μη κερδοσκοπικό, εκπαιδευτικής ή ερευνητικής φύσης, υπό την προϋπόθεση να αναφέρεται η πηγή προέλευσης και να διατηρείται το παρόν μήνυμα. Ερωτήματα που αφορούν τη χρήση της εργασίας για κερδοσκοπικό σκοπό πρέπει να απευθύνονται προς τον συγγραφέα.Οι απόψεις και τα συμπεράσματα που περιέχονται σε αυτό το έγγραφο εκφράζουν τον συγγραφέα και δεν πρέπει να ερμηνευθεί ότι αντιπροσωπεύουν τις επίσημες θέσεις του Εθνικού Μετσόβιου Πολυτεχνείου.

show abstract

A predecoding technique for ILP exploitation in Java processors

Cited by 4 publications

References 8 publications

Reducing the Delay for Decoding Instructions by Predicting Their Source Register Operands

Reducing the Delay for Decoding Instructions by Predicting Their Source Register Operands

A hardware peripheral for Java bytecodes translation acceleration

Σχεδίαση και αποδοτική υλοποίηση της εικονικής μηχανής Java για πολυμεσικές εφαρμογές

Contact Info

Product

Resources

About