and statement reordering. This framework also includes more general classes of loop transforma-This paper presents a formal mathematical tions which can extract more parallelism from a framework which unifies the existing loop trans-class of programs than the existing techniques. formations. This framework also includes more The particular class of programs are those that general classes of loop transformations, which consist of perfectly nested loops possibly with can extract more parallelism from a class of pro-conditional statements where the guards as well grams than the existing techniques. We classify as the array index expression are affine expresschedules into three classes: uniform, subdomain-sions of the loop indices. variant, and statement-variant. Viewing from In the next section, we describe the notations the degree of parallelism to be gained by loop and terminologies used in the paper. We then transformation, the schedules can also be classi-present a formal mathematical framework which fled as single-sequential level, multiple-sequential unifies the existing loop transformation techlevel, and mixed schedules. We also illustrate the niques, and sets the stage for discussing the more usefulness of the more general loop transforma-general classes of loop transformers in Section 3. tion with an example program. A loop transformer is a function that relates a given loop nest with its transformed version, and 1 the mehd7,1011,1,5, eral classes of schedules. The problem formula-0 in Proceedings of the Third ACM SIGPLAN Sympo-tions for obtaining these schedules are based on-oles sium on PPOPP, April 1991.
This article reports on experiments from our ongoing project whose goal is to develop a C++ library which supports adaptive and irregular data structures on distributed memory supercomputers. We demonstrate the use of our abstractions in implementing "tree codes" for large-scale N-body simulations. These algorithms require dynamically evolving treelike data structures, as well as load-balancing, both of which are widely believed to make the application difficult and cumbersome to program for distributed-memory machines. The ease of writing the application code on top of our C++ library abstractions (which themselves are application independent), and the low overhead of the resulting C++ code (over hand-crafted C code) supports our belief that object-oriented approaches are eminently suited to programming distributed-memory machines in a manner that (to the applications programmer) is architecture-independent. Our contribution in parallel programming methodology is to identify and encapsulate general classes of communication and load-balancing strategies useful across applications and MIMD architectures. This article reports experimental results from simulations of half a million particles using multiple methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.