In recent years, the semiconductor industry has turned its focus towards heterogeneous multi-processor platforms. They are an economically viable solution for coping with the growing setup and manufacturing cost of silicon systems. Furthermore, their inherent flexibility also perfectly supports the emerging market of interactive, mobile data and content services. The platform's performance and energy depend largely on how well the data-dominated services are mapped on the memory subsystem. A crucial aspect thereby is how efficient data is transferred between the different memory layers. Several compilation techniques have been developed to optimally use the available bandwidth. Unfortunately, they do not take the interaction between multiple threads running on the different processors into account, only locally optimize the bandwidth nor deal with the dynamic behavior of these applications. The contributions of this chapter are to outline the main limitations of current techniques and to introduce an approach for dealing with the dynamic multi-threaded of our application domain.
The design challenges of media-rich servicesBusiness analysts forecast a 250 billion dollar market for media-rich, mobile wireless terminals [53]. These systems require an enormous computational performance (40GOPS 1 ). Even though current PCs offer this performance requirement, they consume too much power (10-100W). Mobile devices should consume at least two or three orders of magnitude less power [30]. Furthermore, they should be cheap to successfully penetrate the consumer market. Consequently and in spite of the design issues, the engineering and manufacturing costs need to be reduced. Industry strongly believes that platforms are a potential way to meet the above challenges.
The era of platform-based designA platform is a fixed micro-architecture together with a programming environment that minimizes mask-making costs and is flexible enough to work for a set of applications [4]. The production volumes can then remain high over an extended chip lifetime.Given the strong energy constraints, we must choose the flavor of these platforms. Since power is cubic to the processing frequency, parallelism is an effective to reduce power and energy consumption. Then, multiple simple processors are preferred to one complex speculative and out-of-order processor. In the right application domain, we can get better performance and spend less energy. Besides parallelism, heterogeneity is an alternative way to decrease the energy cost. For instance, the TI OMAP platform combines a RISC processor with a Digital Signal Processor (DSP). The RISC is more energy-efficient for the input/output processing and simple control-dominated 1 Giga Operations Per Second