Cover imageChip layout of an ePUMA system with 8 compute clusters, each with a local controller, 8 compute cores and 1536kB of local compute data memory. The complete chip contains 73 processor cores and occupies 45mm 2 in 28nm FDSOI technology.(Parts of this thesis are reprinted with permission from the IEEE.)Printed by LiU-Tryck, Linköping University Linköping, Sweden, 2016
AbstractIn the last ten years, limited clock frequency scaling and increasing power density has shifted IC design focus towards parallelism, heterogeneity and energy efficiency. Improving energy efficiency is by no means simple and it calls for a reevaluation of old design choices in processor architecture, and perhaps more importantly, development of new programming methodologies that exploit the features of modern architectures.This thesis discusses the design of energy-efficient digital signal processors with application-specific instructions sets, so-called ASIP-DSPs, and their programming tools. Target applications for such processors include, but are not limited to, communications, multimedia, image processing, intelligent vision and radar. These applications are often implemented by a limited set of kernel algorithms, whose performance and efficiency are critical to the application's success. At the same time, the extreme non-recurring engineering cost of system-on-chip designs means that product life-time must be kept as long as possible. Neither general-purpose processors nor non-programmable ASICs can meet both the flexibility and efficiency requirements, and ASIPs may instead be the best trade-off between all the conflicting goals.Traditional superscalar-and VLIW processor design focus has been to improve the throughput of fine-grained instructions, which results in high flexibility, but also high energy consumption. SIMD architectures, on the other hand, are often restricted by inefficient data access. The result is architectures which spend more energy and/or time on supporting operations rather than actual computing. In addition to the hardware design, this thesis also discusses parallel programming flow for distributed memory architectures and ePUMA application implementation. A DSP kernel programming language and its compiler is presented. This effectively demonstrates how kernels written in a high-level language can be translated into HOF instructions for very high processing efficiency.
Populärvetenskaplig sammanfattning
PrefaceThis thesis includes material from the following first-author publications:• Andréas Karlsson, Joar Sohl and Dake Liu.Energy-efficient sorting with the distributed memory architecture ePUMA.
In Proceedings of the International Symposium on Parallel and Dis-tributed Processing with Applications (ISPA), 2015.• Andréas Karlsson, Joar Sohl and Dake Liu.
Cost-efficient Mapping of 3-and 5-point DFTs toGeneral Baseband Processors.
In Proceedings of the International Conference on Digital SignalProcessing (DSP), 2015.• Andréas Karlsson, Joar Sohl and Dake Liu.
Software-based QPP Interleaving forBaseband DS...