Due to the large amount of potential parallelism, resource management is a critical issue in multithreaded execution. The challenge in code generation is to control the parallelism without reducing the machine's ability to exploit it. Controlled parallelism reduces idle time, communication, and delay caused by synchronization. At the same time it increases the potential for exploitation of program data structure locality. In this paper, we evaluate the performance of methods to control program parallelism and resource usage in the context of the fine-grain dataflow execution model. The methods are in themselves not new, but their performance analysis is. The two methods to control parallelism here are slicing and chunking. We present the methods and their compilation strategy and evaluate their effectiveness in terms of run time and matching store occupancy. Communication is categorized in memory, loop, call, and expression communication. Input and output message locality is measured. Two techniques to reduce communication are introduced. Grouping allocates loop and function bodies on one processor and bundling combines messages with the same sender and receiver into one. Their effects on the total communication volume are quantified.
Academic Press