By introducing programmability, automated verification, and innovative debugging tools, Software-Defined Networks (SDNs) are poised to meet the increasingly stringent dependability requirements of today's communication networks. However, the design of fault-tolerant SDNs remains an open challenge.This paper considers the design of dependable SDNs through the lenses of self-stabilizationa very strong notion of fault-tolerance. In particular, we develop algorithms for an in-band and distributed control plane for SDNs, called Renaissance, which tolerate a wide range of (concurrent) controller, link, and communication failures. Our self-stabilizing algorithms ensure that after the occurrence of an arbitrary combination of failures, (i) every non-faulty SDN controller can reach any switch (or another controller) in the network within a bounded communication delay (in the presence of a bounded number of concurrent failures) and (ii) every switch is managed by at least one controller (as long as at least one controller is not faulty).We evaluate Renaissance through a rigorous worst-case analysis as well as a prototype implementation (based on OVS and Floodlight), and we report on our experiments using Mininet.
IntroductionContext and Motivation. Software-Defined Network (SDN) technologies have emerged as a promising alternative to the vendor-specific, complex, and hence error-prone, operation of traditional communication networks. In particular, by outsourcing and consolidating the control over the data plane elements to a logically centralized software, SDNs support a programmatic verification and enable new debugging tools. Furthermore, the decoupling of the control plane from the data plane, allows the former to evolve independently of the constraints of the latter, enabling faster innovations.However, while the literature articulates well the benefits of the separation between control and data plane and the need for distributing the control plane (e.g., for performance and fault-tolerance), the question of how connectivity between these two planes is maintained (i.e., the communication channels from controllers to switches and between controllers) has not received much attention. Providing such connectivity is critical for ensuring the availability and robustness of SDNs.Guaranteeing that each switch is managed, at any time, by at least one controller is challenging especially if control is in-band, i.e., if control and data traffic is forwarded along the same links and devices and hence arrives at the same ports. In-band control is desirable as it avoids the need to 1 arXiv:1712.07697v2 [cs.NI] 26 Feb 2019 build, operate, and ensure the reliability of a separate out-of-band management network. Moreover, in-band management can in principle improve the resiliency of a network, by leveraging a higher path diversity (beyond connectivity to the management port).The goal of this paper is the design of a highly fault-tolerant distributed and in-band control plane for SDNs. In particular, we aim to develop a self-stabilizi...