The rise of many-core processor architectures in the market answers to a constantly growing need of processing power to solve more and more challenging problems such as the ones in computing for big data. Fast computation is more and more limited by the very high power required and the management of the considerable heat produced. Many programming models compete to take profit of many-core architectures to improve both execution speed and energy consumption, each with their advantages and drawbacks. The work described in this thesis is based on the dataflow computing approach and investigates the benefits of a carefully pipelined execution of streaming applications, focusing in particular on off-and on-chip memory accesses. As case study, we implement classic and on-chip pipelined versions of mergesort for Intel SCC and Xeon. We see how the benefits of the on-chip pipelining technique are bounded by the underlying architecture, and we explore the problem of fine tuning streaming applications for many-core architectures to optimize for energy given a throughput budget. We propose a novel methodology to compute schedules optimized for energy efficiency given a fixed throughput target. We introduce Drake, derived from Schedeval, a tool that generates pipelined applications for Many-Core architectures and allows the performance testing in time or energy of their static schedule. We show that streaming applications based on Drake compete with specialized implementations and we use Schedeval to demonstrate performance differences between schedules that are otherwise considered as equivalent by a simple model.
This work has been supported in parts by CUGS (the Graduate School in Computer Science, Sweden), Vetenskapsrådet, SeRC and EU FP7 EXCESS.
Department of Computer and Information ScienceLinköping University SE-581 83 Linköping, Sweden Acknowledgements I would like to thank all members of PELAB for the inspiring and stimulating working environment as well as the passionate discussions around coffee or tea. In particular, my thanks to Kristian Sandahl for his efforts at maintening a strong group culture. Warm thanks to Christoph Kessler, for all fruitful discussions and ideas when my imagination came short, for his support and patience when results came late and for entrusting me with opportunities and responsibilities that taught me valuable experiences. Many thanks to Jörg Keller for sharing his ideas in details about my work, and for always carefully reviewing and challenging any hypothesis I proposed, as it repeatedly resulted in strengthening good ideas and discarding bad ones. I would like to thank all students who participated in the work, providing a precious help to progress. I am grateful to Intel for providing the Single Chip Cloud computer research prototype and their efforts to help in the numerous moments where nothing worked. Many thanks to the National Supercomputer Center (NSC) for providing powerful computing means to run my experiments, TUS for always providing a quick and precious technica...