Computer networks have become a critical infrastructure. In fact, networks should not only meet strict requirements in terms of correctness, availability, and performance, but they should also be very flexible and support fast updates, e.g., due to policy changes, increasing traffic, or failures. This paper presents a structured survey of mechanism and protocols to update computer networks in a fast and consistent manner. In particular, we identify and discuss the different desirable consistency properties that should be provided throughout a network update, the algorithmic techniques which are needed to meet these consistency properties, and the implications on the speed and costs at which updates can be performed. We also explain the relationship between consistent network update problems and classic algorithmic optimization ones. While our survey is mainly motivated by the advent of Software-Defined Networks (SDNs) and their primary need for correct and efficient update techniques, the fundamental underlying problems are not new, and we provide a historical perspective of the subject as well.
To meet the stringent requirements on the maximally tolerable disruptions of traffic under link failures, many communication networks feature some sort of static failover mechanism for fast rerouting. However, configuring such static failover mechanisms to achieve a high degree of robustness is known to be challenging, in particular when packet tagging or dynamic node state cannot be used. This paper initiates the systematic study of such local fast failover mechanisms which not only provide connectivity guarantees, even under multiple link failures, but also account for the quality of the resulting failover routes, with respect to locality (i.e., route length) and congestion. Failover quality has received less attention in the literature so far, yet it is increasingly important to support emerging applications. We first show that there exists an inherent tradeoff in terms of achievable locality and congestion of failover routes. We then present CASA, an algorithm providing a high degree of robustness as well as a provable quality of fast rerouting. CASA combines two crucial static resilient routing techniques: combinatorial designs and arc-disjoint arborescences. We complement our formal analysis with a simulation study, in which we compare our algorithms with the state-of-the-art in different scenarios and show benefits in terms of stretch, load, and resilience.
We consider the fundamental problem of updating arbitrary routes in a software-defined network in a (transiently) loop-free manner. Our objective is to compute fast network update schedules which minimize the number of interactions (i.e., rounds) between the controller and the network nodes. We first prove that this problem is difficult in general: The problem of deciding whether a k-round update schedule exists is NPcomplete already for k = 3, and there are problem instances requiring Ω(n) rounds, where n is the network size. Given these negative results, we introduce an attractive, relaxed notion of loop-freedom. We show that relaxed loop-freedom admits for much shorter update schedules (up to a factor Ω(n) in the best case), and present a scheduling algorithm which requires at most Θ(log n) rounds.
Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information. While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput. This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.