Optical networks play a crucial role in the provisioning of grid and cloud computing services. Their high bandwidth and low latency characteristics effectively enable universal users' access to computational and storage resources that thus can be fully exploited without limiting performance penalties. Given the rising importance of such cloud/grid services hosted in (remote) data centers, the various users (ranging from academics, over enterprises, to non-professional consumers) are increasingly dependent on the network connecting these data centers, that must be designed to ensure maximal service availability, i.e., minimizing interruptions. In this chapter we will outline the challenges encompassing the design, i.e., dimensioning, of large-scale backbone (optical) networks interconnecting data centers. This amounts to extensions of the classical routing and wavelength assignment algorithms (RWA) to so-called anycast RWA, but also pertains to jointly dimensioning not just the network but also the data center resources (i.e., servers). We specifically focus on resiliency, given the criticality of the grid/cloud infrastructure in today's businesses, and, for highly critical services, we also include specific design approaches to achieve disaster resiliency.
INTRODUCTIONBack in the 1960s, John McCarthy envisioned the concept of "computation as a public utility", making computing power equally easily accessible as the classical utilities that provide users with water, gas, and electricity. That seminal idea reappeared in the 1990s under the form of grid computing, borrowing its name from the power grid, where "the grid" was aimed to be a highly powerful computing resource that scientists could easily tap into for performing challenging tasks. Similarly, today's cloud computing paradigm is built on the idea of relieving the user from worrying about the resources required to run applications and to store data, as well as on the idea of enabling access to such applications and data from basically any device. Clearly, such concept can be made possible only through a high capacity and low latency network that connects the user to "the cloud", i.e., the distributed computing/storage resources. Undeniably, development of optical network technology has been a major driver that enabled the realization of such grids/clouds. The rise of broadband access networks, and high speed optical networking in wide area networks (WAN) has increased the geographical scale of distributed computing paradigms, extending their range from on-site computing facilities to the cost-efficient aggregation of IT resources for both processing and storage in large scale data centers. These now can supply a broad spectrum of applications, serving a wide audience ranging from end consumers, over business users, to scientists requiring high performance computing (HPC) facilities. Basic concepts underlying so-called grid technology, originating in the e-Science domain (e.g., to process massive data flows from the large hadron collider (LHC) at CE...