The advent of the Internet of Things, sensor and social networks, to mention just a few examples, all contribute towards the solid establishment of the Big Data era. High Performance Computing (HPC) becomes necessary for the efficient processing of the massive amounts of data our society generates, and cloud computing is a critical component to deliver this processing power to a broader audience that cannot afford to acquire and maintain such complex computing systems themselves. However, HPC specific technology and performance is not yet apt to be delivered efficiently over highly flexible and dynamic environments, as typically are the virtualized cloud infrastructures.In this thesis, we address challenges that arise in high performance dynamic cloud environments, that are equipped with HPC specific technology, in the context of networking and virtualization. We use InfiniBand, a high performance lossless interconnection network as the basis of our research, and first show that lossless networks pose prime challenges when the nature of the infrastructure is very dynamic, i.e. exhibits continuous changes. Then we propose a network I/O virtualization architecture, the InfiniBand vSwitch architecture, that can make lossless network technologies more favorable in the cloud. Moreover, we propose different network reconfiguration methods to enable performance-driven reconfigurations in very large network topologies that are commonly found in data centers. Performance-driven reconfigurations are frequently needed to adapt to unpredictable workload changes resulting from the shared and on-demand nature of a cloud platform, or when cloud providers employ live migration of virtual machines to optimize the resource usage of their infrastructure. Last but not least, we propose a new Quality-of-Service metric, called delay, to capture the directly observable service degradation in consolidated cloud environments. We suggest that the delay can be used as a direct service level agreement metric between cloud providers and cloud tenants.
AcknowledgementsFirst, I would like to express my gratitude to my three supervisors; Ernst Gunnar Gran, Tor Skeie and Kyrre Begnum. Without their positiveness, supervision, suggestions and encouragement, my PhD journey would not have been so fruitful and enjoyable during both the good and hard times. Then I would like to thank Bjørn Dag Johnsen from Oracle Norway, our closest collaborate in the ERAC project, the project that mainly funded this thesis, for his enthusiastic attitude and participation in the important discussions that shaped the direction of my work and introduced industrial relevance. My appreciation goes as well to Feroz Zahid for being the most easygoing colleague whom I could imagine sharing office with, and a brilliant and ambitious associate, and Sven-Arne Reinemo for his supervision during my early days.A special thank you must be directed to Hårek Haugerud, Anis Yazidi, Hugo Lewi Hammer, Laurence Marie Anna Habib and the whole NETSYS group at the Oslo and Akershus...