John Cavazos scite author profile

Applying the right compiler optimizations to a particular program can have a significant impact on program performance. Due to the non-linear interaction of compiler optimizations, however, determining the best setting is nontrivial. There have been several proposed techniques that search the space of compiler options to find good solutions; however such approaches can be expensive. This paper proposes a different approach using performance counters as a means of determining good compiler optimization settings. This is achieved by learning a model off-line which can then be used to determine good settings for any new program. We show that such an approach outperforms the state-ofthe-art and is two orders of magnitude faster on average. Furthermore, we show that our performance counter-based approach outperforms techniques based on static code features. Using our technique we achieve a 17% improvement over the highest optimization setting of the commercial PathScale EKOPath 2.3.1 optimizing compiler on the SPEC benchmark suite on a recent AMD Athlon 64 3700+ platform.

show abstract

Auto-tuning a high-level language targeted to GPU codes

Grauer-Gray

Schleicher

et al. 2012

365

154

View full text Add to dashboard Cite

Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and moving an application to new hardware often requires a new optimization configuration for each kernel.In this work, we apply optimizations to GPU code using HMPP, a high-level directive-based language and source-tosource compiler that can generate CUDA / OpenCL code. However, programming with high-level languages may mean a loss of performance compared to using low-level languages. Our work shows that it is possible to improve the performance of a high-level language by using auto-tuning. We perform auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and show results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision. The results show that our auto-tuned HMPP-generated implementations are significantly faster than the default HMPP implementation and can meet or exceed the performance of manually coded CUDA / OpenCL implementations.

show abstract

A Survey on Compiler Autotuning using Machine Learning

et al. 2018

View full text Add to dashboard Cite

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of di erent compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classi es the recent advances in using machine learning for the compiler optimization eld, particularly on the two major problems of (1) selecting the best optimizations, and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the ne-grain classi cation among di erent approaches and nally, the in uential papers of the eld. A. H. Ashouri et al.unrolling, register allocation, etc.) could substantially bene t several performance metrics. Depending on the objectives, these metrics could be execution time, code size, or power consumption. A holistic exploration approach to trade-o these metrics also represents a challenging problem [193].Autotuning [35,256] addresses automatic code-generation and optimization by using di erent scenarios and architectures. It constructs techniques for automatic optimization of di erent parameters to maximize or minimize the satis ability of an objective function. Historically, several optimizations were done in the backend where scheduling, resource-allocation and code-generation are done [56,93]. The constraints and resources form a linear system (ILP) which needs to be solved. Recently, researchers have shown increased e ort in introducing front-end and IR-optimizations. Two observations support this claim: (1) the complexity of a backend compiler requires exclusive knowledge strictly by the compiler designers, and (2) lower overheads with external compiler modi cation compared with back-end modi cations. The IR-optimization process normally involves ne-tuning compiler optimization parameters by a multi-objective optimization formulation which can be harder to explore. Nonetheless, each approach has its bene ts and drawbacks and are subject to analysis under their scope.A major challenge in choosing the right set of compiler optimizations is the fact that these code optimizations are programming language, application, and architecture dependent. Additionally, the word optimization is a misnomer -there is no guarantee the transformed code will perform better than the original version. In fact, aggressive optimizations can even degrade the performance of the code to which they are applied [251]. Understanding the behavior of the optimization...

show abstract

Automatic performance model construction for the fast software exploration of new hardware designs

Cavazos

Dubach

Agakov

et al. 2006

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Cavazos

Using Machine Learning to Focus Iterative Optimization

Rapidly Selecting Good Compiler Optimizations using Performance Counters

Auto-tuning a high-level language targeted to GPU codes

A Survey on Compiler Autotuning using Machine Learning

Automatic performance model construction for the fast software exploration of new hardware designs

Contact Info

Product

Resources

About