Exploitation of APL data parallelism on a shared-memory MIMD machine

JuDz-ching,; ChingWai-Mee,

doi:10.1145/109626.109633

Cited by 4 publications

(5 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ju and Ching [40] presented similar results. Although they were aware of the benefits of loop fusion, their compiler did not perform this transformation automatically.…”

Section: Aplsupporting

confidence: 58%

Compiling nested data-parallel programs for shared-memory multiprocessors

Chatterjee

1993

ACM Trans. Program. Lang. Syst.

View full text Add to dashboard Cite

While data parallelism is well suited from algorithmic, architectural, and linguistic considerations to serve as a basis for portable parallel programming, its characteristic fine-grained parallelism makes the efficient implementation of data-parallel languages on MIMD machmes a challenging task. The design, implementation, and evaluation of an optimizmg compiler are presented for an applicative nested data-parallel language called VCODE targeted at the Encore Multimax, a shared-memory multiprocessor The source language supports nested aggregate data types; aggregate operations including elementwiseformsj scans, reductions, andpermutations; and conditionals and recursion for control flow. A small set of graph-theoretic compile-time optimizations reduce the overheads on MIMD machines in several ways: by increasing the grain size of the output program, by reducing synchronization and storage requirements, and by improving locality of reference. The two key Ideas behind these optimizations are the symbolic analysis of loop structures and hierarchical clustering of the program graph, first by loop structure and then by loop traversal patterns.A benchmark suite demonstrates both the efficiency of the output code and the effectiveness of the optimization.

show abstract

“…Ju and Ching [40] presented similar results. Although they were aware of the benefits of loop fusion, their compiler did not perform this transformation automatically.…”

Section: Aplsupporting

confidence: 58%

Compiling nested data-parallel programs for shared-memory multiprocessors

Chatterjee

1993

ACM Trans. Program. Lang. Syst.

View full text Add to dashboard Cite

show abstract

“…In [9], we propose a dynamic processor allocation scheme which can dynamically determine the required number of processors involved in a parallel block. This static analysis, with the aid of the dynamic processor allocation scheme, can make the generated parallel code execute efficiently.…”

Section: Proofmentioning

confidence: 99%

“…In order to validate the above claims on an existing parallel machine, the second author has developed an APL/C compiler and the first author has implemented a parallel run-time environment targeted at a shared-memory multiprocessor machine at IBM T.J. Watson Research Center during the summer of 1990 [9]. The APL/C compiler generates C source code instead of assembly code for any particular machine.…”

Section: Introductionmentioning

confidence: 99%

“…Our simple APL parallel execution model [9] is based on assigning each parallel subtask to a process. All processes share the same memory address space, and one of them is the main process, which is responsible for the execution of the sequential portion of a program, including all operations on scalar data.…”

Section: Introductionmentioning

confidence: 99%

“…With such a code generation policy, we achieve a reasonably good performance [9]. We also attempt to optimize the generated code, and there have been a number of code optimization techniques incorporated into APL compilers [2,4, 10,1 1].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

On performance and space usage improvements for parallelized compiled APL code

Ching²,

1991

Proceedings of the International Conference on APL '91

Self Cite

View full text Add to dashboard Cite

Loop combination has been a traditional optimization technique employed in APL compilers, but may introduce dependencies into the combined loop. We propose an analysis method by which the compiler can keep track of the change of the parallelism when combining high-level primitives.The analysis is necessary when the compiler needs to decide a trade-off between more parallelism and a further combination.We also show how the space usage, as well as the performance, improves by using system calls with the aid of garbage collection to implement a dynamic memory allocation. A modification of the memory management scheme can also increase available parallelism,Our experimental results indicate that the performance and the space usage improve appreciably with the above enhancements.On Performance and Space 234 APL 91

show abstract