Most scienti c and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data ow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data ow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG into a new structure. The restructuring process is based on a multi-dimensional retiming technique. The theory and two algorithms to obtain full parallelism are presented in this paper. Examples of optimization of nested loops, and digital signal processing designs are shown to demonstrate the e ectiveness of the algorithms.
Multidimensional (MD) systems are widely used to model scientific applications such as image processing, geophysical signal processing, and fluid dynamics. Such systems, usually, contain repetitive groups of operations represented by nested loops. The optimization of such loops, considering processing resource constraints, is required in order to improve their computational time. Most of the existing static scheduling mechanisms, used in the high-level synthesis of very large scale integration (VLSI) architectures, do not consider the parallelism inherent to the multidimensional characteristics of the problem. This paper explores the basic properties of MD loop pipelining and presents two novel techniques, multidimensional rotation scheduling and push-up scheduling, able to achieve the shortest possible schedule length. These new techniques transform a multidimensional data flow graph representing the problem, while assigning the loop operations to a schedule table. The multidimensional rotation scheduling is an iterative "heuristic" method, depending upon an user input, while the push-up scheduling algorithm is able to compute the new schedule in polynomial time. The optimal resulting schedule length and the efficiency of the algorithms are demonstrated by a series of practical experiments.
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loopfusion is an effective way for reducing synchronization and improving data locality. Traditionalfusion techniques, however, either can not address the case when fusion-preventing dependences exist in nested loops, or can nor achieve good parallelism ajierfusion. This paper gives a significant improvement by presenting several eflicient polynomial-time algorithms to solve these problems. These algorithms combined with the retiming technique allow nested loop fusion in the existence of outmost loop-carried dependencesas also in the presence offusion-preventing dependences. Furthermore, our technique is proved to achieve fully parallel execution of the fused loops.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.