This paper presents a software pipelining algorithm for the automatic extraction of ne-grain parallelism in general loops. The algorithm accounts for machine resource constraints in a way that smoothly integrates the management of resource constraints with software pipelining. Furthermore, generality in the software pipelining algorithm is not sacri ced to handle resource constraints, and scheduling choices are made with truly global information. Proofs of correctness and the results of experiments with an implementation are also presented.
' Although, some compiler8(NE90, NPW91, ME92, NN92bl do paxtiaUy mitigate the effect of early register allocation by removing spurious dependencies using dynamic register renamingiCFST). Others [RG82] perform a potentially very expensive post-scheduling register reallocation. For example, ifSHIFT and ADD operations are performed by the same functional unit and have the same definitions and uses, there would be no point in keeping both in the same MUTATIONS set. 'Actually, a register reference is indicated by a NULL expression which indicates that no further computation is necessary for Val since it already resides in the register file. *If this expression is selected as a new mutation for Val, then it will be instantiated in its entirety, but nevertheless the STORE and LOAD will be scheduled separately so that the final locations of the STORE and LOAD wiU not generally be in adjacent instructions.
Many techniques have been proposed for exploiting instruction level parallelism, ranging from the optimal and expensive but ignoring resource constraints, to various forms of introducing resource constraints. One of the most aggressive of these techniques is ResourceConstrained Software Pipelining (RCSP) [ 11. RCSP works by repeatedly scheduling successive iterations of a loop in parallel until the data and resource dependence structure of the loop causes the process to converge on a repeating scheduling pattern. This repeating pattern is then used as the new loop body. In principle, this process can be made optimal with respect to full unrolling and scheduling of the loop. Of course, this is not the same as absolute optimality; however, given the NP-hard nature of the problem and the results of Ell], this may be the strongest form possible for general loop pipelining. The main drawback of RCSP is that, in practice, its space/time overhead can be fairly expensive.In this paper, we present Resource-Directed Loop Pipelining (RDLP), a new approach that attempts t o retain much of the advantages of RCSP while minimizing the expense. It does so by allowing the availability of target resources to in some sense guide the application of parallelism exposing and parallelizing transformations. One of the key features of RDLP is the separation of control heuristics from transformations that allows the loop pipelining to be as general as the underlying system of code motion transformations. Results are presented that show that even with very unsophisticated heuristics, RDLP achieves roughly the same performance as RCSP, while providing a 4-fold decrease in space/time cost.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.