We present the design and implementation of several optimizations and techniques included in the latest IBM Java TM Just-in-Time (JIT) Compiler. We first discuss some of the modifications we have applied to Sun Microsystems' reference implementation of the Java Virtual Machine (JVM TM) Specification to increase the performance, including a change in the object layout. We then describe each of the optimizations, referring to what had to be taken into account because of both the just-in-time nature of the compiler and the requirements of the Java language specification, such as exception checking. We also present code generation techniques targeting Intel architectures, describing the register allocation schemes, exception handling, and code scheduling. Finally we report on the performance of the IBM JIT compiler, showing both the effectiveness of the individual optimizations and the competitive overall performance of the JIT compiler in comparison with a competitor, using industry-standard benchmarking programs. All the techniques presented here are included in the official product (JIT Compiler version 3.0), which has been integrated into the IBM Developer Kit for Windows TM , Java Technology Edition, Version 1.1.7.
The Java language incurs a runtime overhead for exception checks and object accesses without an interior pointer in order to ensure safety. It also requires type inclusion test, dynamic class loading, and dynamic method calls in order to ensure flexibility. A "JustIn-Time" (JIT) compiler generates native code from Java byte code at runtime. It must improve the runtime performance without compromising the safety and flexibility of the Java language. We designed and implemented effective optimizations for the JIT compiler, such as exception check elimination, common subexpression elimination, simple type inclusion test, method inlining, and resolution of dynamic method call. We evaluate the performance benefits of these optimizations based on various statistics collected using SPECjvm98 and two JavaSoft applications with byte code sizes ranging from 20000 to 280000 bytes. Each optimization contributes to an improvement in the performance of the programs.
Many devirtualization techniques have been proposed to reduce the runtime overhead of dynamic method calls for various objectoriented languages, however, most of them are less effective or cannot be applied for Java in a straightforward manner. This is partly because Java is a statically-typed language and thus transforming a dynamic call to a static one does not make a tangible performance gain (owing to the low overhead of accessing the method table) unless it is inlined, and partly because the dynamic class loading feature of Java prohibits the whole program analysis and optimizations from being applied.We propose a new technique called direct devirtualization with the code patching mechanism. For a given dynamic call site, our compiler first determines whether the call can be devirtualized, by analyzing the current class hierarchy. When the call is devirtualizable and the target method is suitably sized, the compiler generates the inlined code of the method, together with the backup code of making the dynamic call. Only the inlined code is actually executed until our assumption about the devirtualization becomes invalidated, at which time the compiler performs code patching to make the backup code executed subsequently. Since the new technique prevents some code motions across the merge point between the inlined code and the backup code, we have furthermore implemented recently-known analysis techniques, such as type analysis and preexistence analysis, which allow the backup code to be completely eliminated. We made various experiments using 16 real programs to understand the effectiveness and characteristics of the devirtualization techniques in our Java Just-InTime (JIT) compiler. In summary, we reduced the number of dynamic calls by ranging from 8.9% to 97.3% (the average of 40.2%), and we improved the execution performance by ranging from -1% to 133% (with the geometric mean of 16%).
The Cell Broadband Enginee processor employs multiple accelerators, called synergistic processing elements (SPEs), for high performance. Each SPE has a highspeed local store attached to the main memory through direct memory access (DMA), but a drawback of this design is that the local store is not large enough for the entire application code or data. It must be decomposed into pieces small enough to fit into local memory, and they must be replaced through the DMA without losing the performance gain of multiple SPEs. We propose a new programming model, MPI microtask, based on the standard Message Passing Interface (MPI) programming model for distributed-memory parallel machines. In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store. Furthermore, the microtasks by exploiting explicit communications in the MPI model. We have created a prototype that includes a novel static scheduler for such optimizations. Our initial experiments have shown some encouraging results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.