This paper describes an integrated architecture, compiler, runtime, and operating system solntion to exploiting heteLogeneons parallelism, The archltec I ale is (1 Ijlpc]iilcd mu!tithreaded multiprocessor. enabli]ls the exec II tie]] of J,ery fIHe (multiple operations within an ]nstructton ) to very coarse (multiple jobs) parallel activities. The compiler and IIIntIIILe focus on managing parallelism within a job, while the opelating systemfocuses on managing parallelism across jobs. By considering the entire system in the design, we were able to smoothly interface its four componell ts. IVhile each component is primarily responsible [or managinx its owa level of parallel activity, feedback mechanisms between components enable resource allocation and nsage to be dynamically u l>-dated. This dynamic adaptation to chan~ia~reqairemen(s and available resources fosters ho{ 11 h igli u [iliza tioa of t he machine and the efficienl expression aad execu(ioa of parallelism.
The development o f T era's MTA system was unusual. It respected the need for fast hardware and large shared memory , facilitating execution of the most demanding parallel application programs. But at the same time, it met the need for a clean mac hine model enabling calculated compiler optimizations and easy programming and the need for no vel architectural features necessary to support fast parallel system software. From its inception, system and application needs have molded the MTA architecture. The result is a system that o ers high performance and ease of programming b y virtue not only of fast physical hardware and at shared memory , but also of the streamlined software systems that well utilize the features of the architecture intended to support them.
In parallel programming, the need to manage communication costs, load imbalance, and irregularities in the computation puts substantial demands on the programmer. Key properties of the architecture, such as the number of processors and the costs of communication, must be exploited to achieve good performance. Coding these properties directly into a program compromises the portability and exibility of the code because signicant changes are usually needed to port or enhance the program. We describe a parallel programming model that supports the concise, independent description of key aspects of a parallel program|such as data distribution, communication, and boundary conditions|without reference to machine idiosyncrasies. The independence of such components improves portability by allowing the components of a program to be tuned independently, and encourages reuse by supporting the composition of existing components. The architecture-sensitive aspects of a computation are isolated from the rest of the program, reducing the need to make extensive changes to port a program. This model is eective in exploiting both data parallelism and functional parallelism. This paper provides programming examples, compares this work to related languages, and presents performance results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.