Many real-time signal pmessing applications are dominated by iterative loop constructs which exhibit a large amount of parallelism. In general, a realisation matched to the required mte of these applications exploits only a relatively small part of the parallelism available in the algorithm. This paper addresses the important problem of selecting the appropriate algorithmic-level decisions, in particular loop manipulations and the like, to arrive at an area-optimized specification for use in register-transfer level synthesis tools. One of the crucial cost factors in this optimisation is memory storage related. A n effective model and methodology are proposed to derive an optimized architecture with fully matched throughput, while avoiding a full traversal of the large search space. The effectiveness of our apprwch is substantiated with seveml realistic test cases.
1: IntroductionMany real-time signal processing (RSP) applications in video, image, speech and telecom processing are dominated by iterative loop constructs which exhibit a large amount of parallelism. Still, the loop organisation is not regular for most realistic algorithms in this target domain [l, 251. In general, a ked-rate realisation of these applications exploits only a relatively small part of the available parallelism. Hence, the main problem is not only to derive upper bounds on the amount of parallelism and on the achievable throughput [2, 21, 23, 16, 221, but especially to select a loop organisation matched to the required throughput. This important problem necessitates the traversal of a very large search space and has not been well researched yet. Only very recently, some research has addressed part of this problem. It is oriented mostly to good estimators to guide the search for feasible solutions under certain restrictions, combined with a hierarchical design space exploration This paper addresses the problem of selecting the appropriate loop manipulations on the initial behavioural description, to arrive at an area-optimized specification with matched throughput. The latter can then be used as input for register-transfer level synthesis tools, i.e. detailed allocation, assignment and scheduling. The cost function to steer these loop transformations should not only include the data-path area but it has to incorporates also the crucial foreground and background memory. Indeed, these storage related factors can be quite dominant over the other costs in multi-dimensional signal processing as in video, image and speech processing [I, 251. Up to now, they have been mostly neglected in the conventional high-level synthesis approaches. Also in the array synthesis community (see most P21.
*Professor at the Katholieke Universiteit Leuven'This research has been partly sponsored by the E.C. projects ESPRIT-2260 (SPRITE) and HCM-ERBCHRXCT930382 (Vision algorithms and optical computer architectures).