Rollback-Recovery for Middleboxes

Sherry, Justine; Basu, Soumya; Panda, Aurojit; Maciocco, Christian; Manesh, Maziar; Martins, João; Ratnasamy, Sylvia; Rizzo, Luigi; Shenker, Scott

doi:10.1145/2829988.2787501

Cited by 60 publications

(94 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…On SPLASH-2, Chimera saw overhead ranging from 1.6× for two cores to over 3× for 8 cores. Similar systems exhibited substantially larger overheads [24] or only considered small and easily parallelizable programs [43].…”

Section: Related Workmentioning

confidence: 99%

Towards Practical Default-On Multi-Core Record/Replay

Mashtizadeh

Garfinkel

Terei

et al. 2017

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

We present Castor, a record/replay system for multi-core applications that provides consistently low and predictable overheads. With Castor, developers can leave record and replay on by default, making it practical to record and reproduce production bugs, or employ fault tolerance to recover from hardware failures. Castor is inspired by several observations: First, an efficient mechanism for logging non-deterministic events is critical for recording demanding workloads with low overhead. Through careful use of hardware we were able to increase log throughput by 10× or more, e.g., we could record a server handling 10× more requests per second for the same record overhead. Second, most applications can be recorded without modifying source code by using the compiler to instrument language level sources of non-determinism, in conjunction with more familiar techniques like shared library interposition. Third, while Castor cannot deterministically replay all data races, this limitation is generally unimportant in practice, contrary to what prior work has assumed. Castor currently supports applications written in C, C++, and Go on FreeBSD. We have evaluated Castor on parallel and server workloads, including a commercial implementation of memcached in Go, which runs Castor in production. CCS Concepts • Software and its engineering → Operating systems

show abstract

Section: Related Workmentioning

confidence: 99%

Towards Practical Default-On Multi-Core Record/Replay

Mashtizadeh

Garfinkel

Terei

et al. 2017

SIGARCH Comput. Archit. News

View full text Add to dashboard Cite

show abstract

“…One or more backup nodes are assigned to a flow (constraints (11)). A flow is assigned a backup node provided that it is using atleast one backup instance hosted on the node (constraint (12)). This model is an Integer Non-linear Program (INLP) because of the non linearity of equation (10).…”

Section: B Formulation: All-anymentioning

confidence: 99%

“…Typically, NFs need to maintain 10-100s of state variables that are per-flow or shared across flows [10]. Backup instances of stateful NFs need to have updated state information to ensure successful failover and service continuity [11], [12].…”

Section: Introductionmentioning

confidence: 99%

Towards Carrier-Grade Service Provisioning in NFV

Woldeyohannes

Tola

Jiang

2019

2019 15th International Conference on the Design of Reliable Communication Networks (DRCN)

View full text Add to dashboard Cite

Network Function Virtualization (NFV) is an emerging technology that reduces cost and brings flexibility in the provisioning of services. NFV-based networks are expected to be able to provide carrier-grade services, which require high availability. One of the challenges for achieving high availability is that the commodity servers used in NFV are more error prone than the purpose-built hardware. The "de-facto" technique for fault tolerance is redundancy. However, unless planned carefully, structural dependencies among network nodes could result in correlated node unavailabilities that undermine the effect of redundancy. In this paper, we address the challenge of developing a redundancy resource allocation scheme that takes into account correlated unavailabilities caused by network structural dependencies. The proposed scheme consist of two parts. In the first part, we propose an algorithm to identify nodes that can be highly affected by a node failure because of their network structural dependency with this node. The algorithm analyzes such dependencies using a recently proposed centrality measure called dependency index. In the second part, a redundancy resource allocation scheme that places backup network functions on nodes considering their dependency nature and assigns the instances to flows optimally is proposed. The results show that not considering the network structural dependency in backup placement may significantly affect the service availability to flows. The results also give insights into the trade-off between cost and performance.

show abstract

“…Active-active replication, where master and slave are executed on all inputs but only the master's output is released to users, will not work because of the non-deterministic nature of packet processing in middleboxes. Sherry et al [130], [131] proposed a fault-tolerant middlebox, a new design for fault-tolerant middleboxes that achieves correctness, fast recovery with only a slight increase in latency. They took a replay-based approach that maintains a log of inputs to the system and recreates the lost state by replaying the inputs from the log in the event of a failure.…”

Section: State Managementmentioning

confidence: 99%

Research Challenges for Network Function Virtualization - Re-Architecting Middlebox for High Performance and Efficient, Elastic and Resilient Platform to Create New Services -

Shiomoto

2018

IEICE Trans. Commun.

View full text Add to dashboard Cite

SUMMARYToday's enterprise, data-center, and internet-serviceprovider networks deploy different types of network devices, including switches, routers, and middleboxes such as network address translation and firewalls. These devices are vertically integrated monolithic systems. Software-defined networking (SDN) and network function virtualization (NFV) are promising technologies for dis-aggregating vertically integrated systems into components by using "softwarization". Software-defined networking separates the control plane from the data plane of switch and router, while NFV decouples high-layer service functions (SFs) or Network Functions (NFs) implemented in the data plane of a middlebox and enables the innovation of policy implementation by using SF chaining. Even though there have been several survey studies in this area, this area is continuing to grow rapidly. In this paper, we present a recent survey of this area. In particular, we survey research activities in the areas of re-architecting middleboxes, state management, high-performance platforms, service chaining, resource management, and trouble shooting. Efforts in these research areas will enable the development of future virtual-network-function platforms and innovation in service management while maintaining acceptable capital and operational expenditure. key words: network function virtualization, software-defined networking, service chain, policy management, resource management

show abstract

Rollback-Recovery for Middleboxes

Cited by 60 publications

References 35 publications

Towards Practical Default-On Multi-Core Record/Replay

Towards Practical Default-On Multi-Core Record/Replay

Towards Carrier-Grade Service Provisioning in NFV

Research Challenges for Network Function Virtualization - Re-Architecting Middlebox for High Performance and Efficient, Elastic and Resilient Platform to Create New Services -

Contact Info

Product

Resources

About