Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency -especially the tail of the latency distribution -can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in realworld systems including DNS queries, database servers, and packet forwarding within networks.
Network middleboxes must offer high availability, with automatic failover when a device fails. Achieving high availability is challenging because failover must correctly restore lost state (e.g., activity logs, port mappings) but must do so quickly (e.g., in less than typical transport timeout values to minimize disruption to applications) and with little overhead to failure-free operation (e.g., additional per-packet latencies of 10-100s of µs). No existing middlebox design provides failover that is correct, fast to recover, and imposes little increased latency on failure-free operations. We present a new design for fault-tolerance in middleboxes that achieves these three goals. Our system, FTMB (for Fault-Tolerant MiddleBox), adopts the classical approach of "rollback recovery" in which a system uses information logged during normal operation to correctly reconstruct state after a failure. However, traditional rollback recovery cannot maintain high throughput given the frequent output rate of middleboxes. Hence, we design a novel solution to record middlebox state which relies on two mechanisms: (1) 'ordered logging', which provides lightweight logging of the information needed after recovery, and (2) a 'parallel release' algorithm which, when coupled with ordered logging, ensures that recovery is always correct. We implement ordered logging and parallel release in Click and show that for our test applications our design adds only 30µs of latency to median per packet latencies. Our system introduces moderate throughput overheads (5-30%) and can reconstruct lost state in 40-275ms for practical systems. CCS Concepts • Networks → Middleboxes / network appliances; • Computer systems organization → Availability;
Operators and researchers want accurate router-level views of the Internet for purposes including troubleshooting and modeling. However, tools such as traceroute return IP addresses. Because routers may have dozens of IP addresses, or aliases, multiple measurements may return different addresses, obscuring whether they represent the same machine. While many techniques exist to address this issue by identifying some IP aliases, these techniques, even in combination, find only a subset of alias pairs.To improve this state, we design and evaluate a new alias resolution technique using the IP prespecified timestamp option. This option allows a sender to request timestamp values from multiple IP addresses in the same probe. By careful arrangement of these IP addresses, we show that we can infer aliases in many cases.In this paper, we conduct a measurement study of how many routers support IP timestamps, demonstrating that enough honor the option to base our technique on it. Using our technique, and compared to the most accurate alias information available, we find that 94.7% of the aliases identified by our technique are true positives. Further, we show that our IP timestamp-based technique complements existing alias resolution techniques, providing significant gains by discovering previously unidentifiable aliases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.