In recent years, the semiconductor industry has turned its focus towards heterogeneous multiprocessor platforms. They are an economically viable solution for coping with the growing setup and manufacturing cost of silicon systems. Furthermore, their inherent flexibility perfectly supports the emerging market of interactive, mobile data and content services. The platform's performance and energy depend largely on how well the data-dominated services are mapped on the memory subsystem. A crucial aspect thereby is how efficient data is transferred between the different memory layers. Several compilation techniques have been developed to optimally use the available bandwidth. Unfortunately, they do not take the interaction between multiple threads into account and do not deal with the dynamic behaviour of these novel applications. The main limitations of current techniques are outlined and an approach for dealing with them is introduced.
Design challenges of media-rich servicesBusiness analysts forecast a 250 billion dollar market for media-rich, mobile wireless terminals [1]. These systems require an enormous computational performance: 40 gigaoperations per second (GOPS). Even though current PCs could provide sufficient performance, they are too powerhungry (10 -100 W). Mobile devices should consume at least two or three orders of magnitude less [2]. Furthermore, they should be cheap to successfully penetrate the consumer market. Consequently, and in spite of the design issues, the engineering and manufacturing costs need to be reduced. Industry strongly believes that platforms are a potential way to meet the above challenges.
Era of platform-based designA platform is a fixed microarchitecture together with a programming environment that minimises mask-making costs and is flexible enough to work for a set of applications [3]. The production volumes can then remain high over an extended chip lifetime.To cope with the energy constraints, platforms usually consist of multiple processors. Since power is cubic to the processing frequency, parallelism is an effective way to reduce it. Therefore, on most platforms two or more processors are integrated. Besides parallelism, heterogeneity is an alternative way to decrease the energy cost. For instance, the TI OMAP platform combines a RISC processor with a digital signal processor (DSP). The RISC is more energy-efficient for the input=output processing and control-dominated applications. The DSP, however, provides the computational performance for audio and video processing, while keeping the energy cost bounded. Indeed, taking a look to the current market offers (e.g. ST Nomadik