In a SIMD or VL1W machine, conceptual synchronizations are accomplished by using a static code schedule that does not require run-time synchronization. The lack of run-time synchronization overhead makes these machines very effective for fine-grain parallelism, but they cannot execute parallel code structures as general as those executed by MIMD architectures, and this limits their utility.In this paper we present a timing analysis that allows a compiler for a MIMD machine to eliminate a large fraction of the run-time synchronization by making efficient use of static code scheduling. Although these techniques can be adapted to be applied to most MIMD machines, this paper centers on the analysis and scheduling for barrier MIMD machines. Barrier MIMDs are asynchronous multiple instruction stream/multiple data stream architectures capable of parallel execution of variable execution-time instructions and arbitrary control flow (e.g., wh i I e loops and calls). However, they also incorporate a special hardware barrier synchronization mechanism that facilitates static scheduling by providing a mechanism which the compiler can use to enforce precise timing constraints. In other words, the compiler tracks relative timing between processors and uses static code scheduling until the timing imprecision becomes too large, at which point the compiler simply inserts a barrier to reduce that timing imprecision to zero (or a small constant).This paper describes new scheduling and b a~-rier placement algorithms for barrier MIMDs that are based loosely on the list scheduling approach employed for VLlWs [Ellis 1985]. In addition, the experimental results from scheduling thousands of synthetic benchmark programs for a parameterized barrier MIMD machine are presented.
Partitioning the iteration space can signijcandy affect lhe execution time of a loop. In this paper, we propose an improvement over previous partitioning methods for single loops with unform data dependencies. For distributed memory systems, partitioning each loop separately does not guarantee an efjcient execution of the code because of across loop data dependence. As a result, a global iteration space is formed so that all loops in a program are considered when partitioning the global space.In addition, a new and general form of expressing data dependence called hyperplane dependence is introduced and used in the partitioning. It is a dependence whose source and destination are subspaces (of any dimension) of the global iteration space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.