Albert Greenberg scite author profile

Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in TCP's demands on the limited buffer space available in data center switches. For example, bandwidth hungry "background" flows build up queues at the switches, and thus impact the performance of latency sensitive "foreground" traffic.To address these problems, we propose DCTCP, a TCP-like protocol for data center networks. DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. We evaluate DCTCP at 1 and 10Gbps speeds using commodity, shallow buffered switches. We find DCTCP delivers the same or better throughput than TCP, while using 90% less buffer space. Unlike TCP, DCTCP also provides high burst tolerance and low latency for short flows. In handling workloads derived from operational measurements, we found DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic. Further, a 10X increase in foreground traffic does not cause any timeouts, thus largely eliminating incast problems.

show abstract

The nature of data center traffic

Kandula

et al. 2009

View full text Add to dashboard Cite

Deriving traffic demands for operational IP networks: methodology and experience

Feldmann¹,

Greenberg

Lund

et al. 2001

IEEE/ACM Trans. Networking

397

351

View full text Add to dashboard Cite

Engineering a large IP backbone network without an accurate, network-wide view of the tra c demands is challenging. Shifts in user behavior, changes in routing policies, and failures of network elements can result in signi cant (and sudden) uctuations in load. In this paper, we present a model of tra c demands to support tra c engineering and performance debugging of large Internet Service Provider networks. By de ning a tra c demand as a volume of load originating from an ingress link and destined to a set of egress links, we can capture and predict how routing a ects the tra c traveling between domains. To infer the tra c demands, we propose a measurement methodology that combines ow-level measurements collected at all ingress links with reachability information about all egress links. We d i scuss how to cope with situations where practical considerations limit the amount and quality of the necessary data. Speci cally, we show how to infer interdomain tra c demands using measurements collected at a smaller number of edge links | the peering links connecting to neighboring providers. We report on our experiences in deriving the tra c demands in the AT&T IP Backbone, by collecting, validating, and joining very large and diverse sets of usage, con guration, and routing data over extended periods of time. The paper concludes with a preliminary analysis of the observed dynamics of the tra c demands and a discussion of the practical implications for tra c engineering.

show abstract

Data center TCP (DCTCP)

Alizadeh

Greenberg

Maltz

et al. 2010

SIGCOMM Comput. Commun. Rev.

551

326

View full text Add to dashboard Cite

Cloud data centers host diverse applications, mixing workloads that require small predictable latency with others requiring large sustained throughput. In this environment, today's state-of-the-art TCP protocol falls short. We present measurements of a 6000 server production cluster and reveal impairments that lead to high application latencies, rooted in TCP's demands on the limited buffer space available in data center switches. For example, bandwidth hungry "background" flows build up queues at the switches, and thus impact the performance of latency sensitive "foreground" traffic. To address these problems, we propose DCTCP, a TCP-like protocol for data center networks. DCTCP leverages Explicit Congestion Notification (ECN) in the network to provide multi-bit feedback to the end hosts. We evaluate DCTCP at 1 and 10Gbps speeds using commodity, shallow buffered switches. We find DCTCP delivers the same or better throughput than TCP, while using 90% less buffer space. Unlike TCP, DCTCP also provides high burst tolerance and low latency for short flows. In handling workloads derived from operational measurements, we found DCTCP enables the applications to handle 10X the current background traffic, without impacting foreground traffic. Further, a 10X increase in foreground traffic does not cause any timeouts, thus largely eliminating incast problems.

show abstract

Join-Idle-Queue: A novel load balancing algorithm for dynamically scalable web services

Xie

Kliot

et al. 2011

Performance Evaluation

324

282

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.