hiis paper presents a fundamentally new approach to global register allocation that optimally allocates registers and optimally places spill code, significantly decreasing spill code overhead compared with the traditional graphcoloring approach. The Optimal Register Allocation (ORA) approach formulates global register allocation as a 0-1 integer programming problem, incorporating all aspects of register allocation within a unified framework, including copy elimination, live range splitting, rematerialization, callee and caller register spilling, special instruction-operand requirements, and paired registers. A prototype O M allocator is built into the Gnu C Compiler (GCC). For the SPEC92 integer benchmarks, the ORA allocator actually produces a net decrease of more than 100 million cycles across the entire benchmark set, because the dynamic copies the ORA allocator removes exceed the dynamic loads and stores that are inserted. In contrast, the GCC allocator and a Chaitin-style graph-coloring allocator each cause a net increase of more than 1 billion cycles. Because global register allocation is NP-complete, optimal register allocation has been considered intractable. However, the run-time complexity of the ORA approach is shown experimentally to be 0(n3). A profile-guided hybrid allocation approach is proposed that uses the ORA allocator for the performancecritical regions in the performance critical functions, while using a graph-coloring allocator for the noncritical functions and regions. An ORA-GCC hybrid allocator takes an average of 4.6 seconds per function to produce an allocation that is within 1% of optimal for 97% of the SPEC92 integer benchmark functions, showing that the hybrid allocator is practical as an advanced optimization for performance-critical codes.A register allocator manages the contents of the target processor's small register file. During an initial instruction scheduling phase, a typical compiler uses an unlimited number of symbolic I . The register allocator does not reorder instructions. 2. The register allocator adds only spill load instructions, spill store instructions, and rematerialization instructions.
This article presents the first optimal algorithm for trace scheduling. The trace is a global scheduling region used by compilers to exploit instruction-level parallelism across basic block boundaries. Several heuristic techniques have been proposed for trace scheduling, but the precision of these techniques has not been studied relative to optimality. This article describes a technique for finding provably optimal trace schedules, where optimality is defined in terms of a weighted sum of schedule lengths across all code paths in a trace. The optimal algorithm uses branch-and-bound enumeration to efficiently explore the entire solution space. Experimental evaluation of the algorithm shows that, with a time limit of 1 s per problem, 91% of the hard trace scheduling problems in the SPEC CPU 2006 Integer Benchmarks are solved optimally. For 58% of these hard problems, the optimal schedule is improved compared to that produced by a heuristic scheduler with a geometric mean improvement of 3.2% in weighted schedule length and 18% in compensation code size.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.