Energy efficient embedded computing enables new application scenarios in mobile devices like software-defined radio and video processing. The hierarchical multiprocessor considered in this work may contain dozens or hundreds of resource efficient VLIW CPUs. Programming this number of CPU cores is a complex task requiring compiler support. The stream programming paradigm provides beneficial properties that help to support automatic partitioning. This work describes a compiler for streaming applications targeting the self-build hierarchical CoreVA-MPSoC multiprocessor platform. The compiler is supported by a programming model that is tailored to fit the streaming programming paradigm. We present a novel simulatedannealing (SA) based partitioning algorithm, called Smart SA. The overall speedup of Smart SA is 12.84 for an MPSoC with 16 CPU cores compared to a single CPU implementation. Comparison with a state of the art partitioning algorithm shows an average performance improvement of 34.07%.
I . I N T R O D U C T I O NThe decreasing feature size of microelectronic circuits allows for the integration of more and more processing cores on a single chip. A Multiprocessor System-on-Chip (MPSoC) may consist of dozens of processing elements as CPU cores or specialized hardware accelerators connected by a highspeed communication infrastructure, i.e. a Network-On-Chip (NoC). However, mapping general purpose applications to a large number of MPSoC processing elements remains a nontrivial task. Manually writing low-level code for each core makes it difficult to experiment with different decompositions and mappings of computation to processors. Alternatively, higher-level programming frameworks allow the compiler to evaluate a larger design-space when mapping the application to different hardware configurations. Efficient mapping algorithms are important for finding optimized solutions. The Streaming paradigm provides regular and repeating computation and independent filters with explicit communication. This allows compilers to exploit the task more easily, data and pipeline parallelism commonly found in signal processing, multimedia, network processing, cryptology and similar application domains.A popular stream based programming language is StreamIt [1], [2]. The key principle of this language is to provide information about inherent parallelism of the program by using a structured data flow graph. This graph consisting of filters, pipelines, split-joins, and feedback loops.In this paper we present a compiler for the StreamIt Language targeting the self-build CoreVA-MPSoC architecture. The CoreVA-MPSoC is a highly scalable multiprocessor system based on a hierarchical communication infrastructure and the configurable VLIW 1 processor CoreVA.This paper is organized as follows: Section II describes our CoreVA-MPSoC hardware architecture. In Section III we discuss our StreamIt compiler with a focus on our novel simulated annealing partitioning algorithm (Smart SA). The communication model proposed in this work is presented in S...