Mircea Namolaru scite author profile

Tuning compiler optimizations for rapidly evolving hardware makes porting and extending an optimizing compiler for each new platform extremely challenging. Iterative optimization is a popular approach to adapting programs to a new architecture automatically using feedback-directed compilation. However, the large number of evaluations required for each program has prevented iterative compilation from widespread take-up in production compilers. Machine learning has been proposed to tune optimizations across programs systematically but is currently limited to a few transformations, long training phases and critically lacks publicly released, stable tools.Our approach is to develop a modular, extensible, self-tuning optimization infrastructure to automatically learn the best optimizations across multiple programs and architectures based on the correlation between program features, run-time behavior and optimizations. In this paper we describe Milepost GCC, the first publicly-available open-source machine learning-based compiler. It consists of an Interactive Compilation Interface (ICI) and plugins to extract program features and exchange optimization data with the cTuning.org open public repository. It automatically adapts the internal optimization heuristic at function-level granularity to improve execution time, code size and compilation time of a new program on a given architecture. Part of the Milepost technology together with low-level ICI-inspired plugin framework is now included in the mainline GCC.We developed machine learning plugins based on probabilistic and transductive approaches to predict good combinations of optimizations. Our preliminary experimental results show that it is possible to automatically reduce the execution time of individual MiBench programs, some by more than a factor of 2, while also improving compilation 1 INRIA Saclay, France (HiPEAC member) · 2 University of Versailles Saint Quentin en Yvelines, France · 3 IBM Haifa, Israel (HiPEAC member) · 4 CAPS Entreprise, France (HiPEAC member) · 5 ARC International, UK · 6 University of Edinburgh, UK (HiPEAC member) · 2 time and code size. On average we are able to reduce the execution time of the MiBench benchmark suite by 11% for the ARC reconfigurable processor. We also present a realistic multi-objective optimization scenario for Berkeley DB library using Milepost GCC and improve execution time by approximately 17%, while reducing compilation time and code size by 12% and 7% respectively on Intel Xeon processor.

show abstract

Practical aggregation of semantical program properties for machine learning based optimization

Namolaru

Cohen

Fursin

et al. 2010

View full text Add to dashboard Cite

Iterative search combined with machine learning is a promising approach to design optimizing compilers harnessing the complexity of modern computing systems. While traversing a program optimization space, we collect characteristic feature vectors of the program, and use them to discover correlations across programs, target architectures, data sets, and performance. Predictive models can be derived from such correlations, effectively hiding the time-consuming feedback-directed optimization process from the application programmer.One key task of this approach, naturally assigned to compiler experts, is to design relevant features and implement scalable feature extractors, including statistical models that filter the most relevant information from millions of lines of code. This new task turns out to be a very challenging and tedious one from a compiler construction perspective. So far, only a limited set of ad-hoc, largely syntactical features have been devised. Yet machine learning is only able to discover correlations from information it is fed with: it is critical to select topical program features for a given optimization problem in order for this approach to succeed.We propose a general method for systematically generating numerical features from a program. This method puts no restrictions on how to logically and algebraically aggregate semantical properties into numerical features. We illustrate our method on the difficult problem of selecting the best possible combination of 88 available optimizations in GCC. We achieve 74% of the potential speedup obtained through iterative compilation on a wide range of benchmarks and four different general-purpose and embedded architectures. Our work is particularly relevant to embedded system designers willing to quickly adapt the optimization heuristics of a mainstream compiler to their custom ISA, microarchitecture, benchmark suite and workload. Our method has been integrated with the publicly released MILEPOST GCC [14].

show abstract

Contributions to the GNU Compiler Collection

et al. 2005

View full text Add to dashboard Cite

Compiling for an indirect vector register architecture

Nuzman

Namolaru

Zaks

et al. 2008

View full text Add to dashboard Cite

The iVMX architecture contains a novel vector register file of up to 4096 vector registers accessed indirectly via a mapping mechanism, providing compatibility with the VMX architecture, and potential for dramatic performance benefits [7]. The large number of vector registers and the unique indirection mechanism pose compilation challenges to be used efficiently: the indirection mechanism emphasizes spatial locality of registers and interaction among destination and source operands during register allocation, and the many vector registers call for aggressive automatic vectorization.This work is a first step in addressing the compilability of iVMX, following the presentation and validation of its architectural aspects [7]. In this paper we present several compilation approaches to deal with the mapping mechanism and an outer-loop vectorization transformation developed to promote the use of many vector registers. We modified an existing register allocator to target all available registers and added a post-pass to rename live-ranges considering spatial locality and interaction among operand types. An FIR filter is used to demonstrate the effectiveness of the techniques developed compared to a version hand-optimized for iVMX. Initial results show that we can reduce the overhead of map management down to 29% of the total instruction count, compared to 22% obtained manually, and compared to 49% obtained using a naive scheme, while outperforming an equivalent VMX implementation by a factor of 2.

show abstract

Recursion, Probability, Convolution and Classification for Computations

Namolaru¹,

Goubier²

2019

Preprint

View full text Add to dashboard Cite

The main motivation of this work was practical, to offer computationally and theoretical scalable ways to structuring large classes of computation. It started from attempts to optimize R code for machine learning/artificial intelligence algorithms for huge data sets, that due to their size, should be handled into an incremental (online) fashion. Our target are large classes of relational (attribute based), mathematical (index based) or graph computations. We wanted to use powerful computation representations that emerged in AI (artificial intelligence)/ML (machine learning) as BN (Bayesian networks) and CNN (convolution neural networks). For the classes of computation addressed by us, and for our HPC (high performance computing) needs, the current solutions for translating computations into such representation need to be extended. Our results show that the classes of computation targeted by us, could be treestructured, and a probability distribution (defining a DBN, i.e. Dynamic Bayesian Network) associated with it. More ever, this DBN may be viewed as a recursive CNN (Convolution Neural Network). Within this tree-like structure, classification in classes with size bounded (by a parameterizable may be performed. These results are at the core of very powerful, yet highly practically algorithms for restructuring and parallelizing the computations. The mathematical background required for an in depth presentation and exposing the full generality of our approach) is the subject of a subsequent paper. In this paper, we work in an limited (but important) framework that could be understood with rudiments of linear algebra and graph theory. The focus is in applicability, most of this paper discuss the usefulness of our approach for solving hard compilation problems related to automatic parallelism.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mircea Namolaru

Milepost GCC: Machine Learning Enabled Self-tuning Compiler

Practical aggregation of semantical program properties for machine learning based optimization

Contributions to the GNU Compiler Collection

Compiling for an indirect vector register architecture

Recursion, Probability, Convolution and Classification for Computations

Contact Info

Product

Resources

About