Abstract-This paper explores the problem of efficiently ordering interprocessor communication (IPC) operations in statically scheduled multiprocessors for iterative dataflow graphs. In most digital signal processing (DSP) applications, the throughput of the system is significantly affected by communication costs. By explicitly modeling these costs within an effective graph-theoretic analysis framework, we show that ordered transaction schedules can significantly outperform self-timed schedules even when synchronization costs are low. However, we also show that when communication latencies are nonnegligible, finding an optimal transaction order given a static schedule is an NP-complete problem, and that this intractability holds both under iterative and noniterative execution. We develop new heuristics for finding efficient transaction orders, and perform an extensive experimental comparison to gauge the performance of these heuristics.Index Terms-Dataflow, multiprocessor, scheduling, synchronization.
I. BACKGROUNDT HIS paper explores the problem of efficiently ordering interprocessor communication (IPC) operations in statically scheduled multiprocessors for iterative dataflow specifications. An iterative dataflow specification consists of a dataflow representation of the body of a loop that is to be iterated indefinitely. Dataflow programming in this form is used widely in the design and implementation of digital signal processing (DSP) systems. In this paper, we assume that we are given a dataflow specification of an application, and an associated multiprocessor schedule (e.g., derived from scheduling techniques such as those presented in [1]-[4]). Our objective is to reduce the overall IPC cost of the multiprocessor implementation, and the associated performance degradation, since IPC operations result in significant execution time and power consumption penalties, and are difficult to optimize thoroughly during the scheduling stage.High-density, low-power, and inexpensive multiprocessor DSP solutions are in growing demand in telecom, internet routing, and IP telephony markets. Typical application areas include voice compression, echo cancellation, modem banks, and voice-over-IP. As transistor sizes shrink and processors