No abstract
In data center applications, predictability in service time and controlled latency, especially tail latency, are essential for building performant applications. This is especially true for applications or services built by accessing data across thousands of servers to generate a user response. Current practice has been to run such services at low utilization to rein in latency outliers, which decreases efficiency and limits the number of service invocations developers can issue while still meeting tight latency budgets.In this paper, we analyze three data center applications, Memcached, OpenFlow, and Web search, to measure the effect of 1) kernel socket handling, NIC interaction, and the network stack, 2) application locks contested in the kernel, and 3) application-layer queueing due to requests being stalled behind straggler threads on tail latency. We propose Chronos, a framework to deliver predictable, low latency in data center applications. Chronos uses a combination of existing and new techniques to achieve this end, for example by supporting Memcached at 200,000 requests per second per server at mean latency of 10 μs with a 99 th percentile latency of only 30 μs, a factor of 20 lower than baseline Memcached.
Data Center topologies employ multiple paths among servers to deliver scalable, cost-effective network capacity. The simplest and the most widely deployed approach for load balancing among these paths, Equal Cost Multipath (ECMP), hashes flows among the shortest paths toward a destination. ECMP leverages uniform hashing of balanced flow sizes to achieve fairness and good load balancing in data centers. However, we show that ECMP further assumes a balanced, regular, and fault-free topology, which are invalid assumptions in practice that can lead to substantial performance degradation and, worse, variation in flow bandwidths even for same size flows.We present a set of simple algorithms that achieve Weighted Cost Multipath (WCMP) to balance traffic in the data center based on the changing network topology. The state required for WCMP is already disseminated as part of standard routing protocols and it can be readily implemented in the current switch silicon without any hardware modifications. We show how to deploy WCMP in a production OpenFlow network environment and present experimental and simulation results to show that variation in flow bandwidths can be reduced by as much as 25× by employing WCMP relative to ECMP.
Solving "Big Data" problems requires bridging massive quantities of compute, memory, and storage, which requires a very high bandwidth network. Recently proposed direct connect networks like HyperX [1] and Flattened Butterfly [20] offer large capacity through paths of varying lengths between servers, and are highly cost effective for common data center workloads. However data center deployments are constrained to multi-rooted tree topologies like Fat-tree [2] and VL2 [16] due to shortest path routing and the limitations of commodity data center switch silicon.In this work we present Dahu 1 , simple enhancements to commodity Ethernet switches to support direct connect networks in data centers. Dahu avoids congestion hot-spots by dynamically spreading traffic uniformly across links, and forwarding traffic over non-minimal paths where possible. By performing load balancing primarily using local information, Dahu can act more quickly than centralized approaches, and responds to failure gracefully. Our evaluation shows that Dahu delivers up to 500% improvement in throughput over ECMP in large scale HyperX networks with over 130,000 servers, and up to 50% higher throughput in an 8,192 server Fat-tree network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.