P4

Bosshart, Pat; Daly, Dan; Gibb, Glen; Izzard, Martin; McKeown, Nick; Rexford, Jennifer; Schlesinger, Cole; Talayco, Dan; Vahdat, Amin; Varghese, George; Walker, David

doi:10.1145/2656877.2656890

Cited by 2,091 publications

(286 citation statements)

References 12 publications

Supporting

Mentioning

189

Contrasting

Unclassified

Order By: Relevance

“…Recent work has proposed hardware architectures [1,5,11,17] and software abstractions [16,37] for programmable switches. While many packet-processing tasks can be programmed on these switches, scheduling isn't one of them.…”

Section: Related Workmentioning

confidence: 99%

Programmable Packet Scheduling at Line Rate

Sivaraman¹,

Subramanian²,

Alizadeh³

et al. 2016

Proceedings of the 2016 ACM SIGCOMM Conference

Self Cite

200

View full text Add to dashboard Cite

Switches today provide a small menu of scheduling algorithms. While we can tweak scheduling parameters, we cannot modify algorithmic logic, or add a completely new algorithm, after the switch has been designed. This paper presents a design for a programmable packet scheduler, which allows scheduling algorithms-potentially algorithms that are unknown today-to be programmed into a switch without requiring hardware redesign.Our design uses the property that scheduling algorithms make two decisions: in what order to schedule packets and when to schedule them. Further, we observe that in many scheduling algorithms, definitive decisions on these two questions can be made when packets are enqueued. We use these observations to build a programmable scheduler using a single abstraction: the push-in first-out queue (PIFO), a priority queue that maintains the scheduling order or time.We show that a PIFO-based scheduler lets us program a wide variety of scheduling algorithms. We present a hardware design for this scheduler for a 64-port 10 Gbit/s sharedmemory (output-queued) switch. Our design costs an additional 4% in chip area. In return, it lets us program many sophisticated algorithms, such as a 5-level hierarchical scheduler with programmable decisions at each level.

show abstract

Section: Related Workmentioning

confidence: 99%

Programmable Packet Scheduling at Line Rate

Sivaraman¹,

Subramanian²,

Alizadeh³

et al. 2016

Proceedings of the 2016 ACM SIGCOMM Conference

Self Cite

200

View full text Add to dashboard Cite

show abstract

“…We present a prototype implementation of DAIET, using P4 [2], for MapReduce-based applications. However, the techniques proposed by DAIET are general enough to be implemented on various programmable network devices, other network programming languages, and be applicable for other applications that follow the partition/aggregate pattern (e.g., graph processing, deep learning, and stream processing).…”

Section: The Daiet Approachmentioning

confidence: 99%

“…A communication phase is needed each time workers need to synchronize the computation and, at last, to produce the final output. In these applications, the network communication costs can be one of the dominant scalability bottlenecks especially in case of multi-stage or iterative computations [1].The advent of flexible networking hardware and expressive data plane programming languages have produced networks that are deeply programmable [2]. This creates the opportunity to co-design distributed systems with their network layer, which can offer substantial performance benefits.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Daiet

Sapio

Abdelaziz

Canini

et al. 2017

Proceedings of the 2017 Symposium on Cloud Computing

View full text Add to dashboard Cite

Many data center applications nowadays rely on distributed computation models like MapReduce and Bulk Synchronous Parallel (BSP) for data-intensive computation at scale [4]. These models scale by leveraging the partition/aggregate pattern where data and computations are distributed across many worker servers, each performing part of the computation. A communication phase is needed each time workers need to synchronize the computation and, at last, to produce the final output. In these applications, the network communication costs can be one of the dominant scalability bottlenecks especially in case of multi-stage or iterative computations [1].The advent of flexible networking hardware and expressive data plane programming languages have produced networks that are deeply programmable [2]. This creates the opportunity to co-design distributed systems with their network layer, which can offer substantial performance benefits. A possible use of this emerging technology is to execute the logic traditionally associated with the application layer into the network itself. Given that in the above mentioned applications the intermediate results are necessarily exchanged through the network, it is desirable to offload to it part of the aggregation task to reduce the traffic and lessen the work of the servers. However, these programmable networking devices typically have very stringent constraints on the number and type of operations that can be performed at line rate. Moreover, packet processing at high speed requires a very fast memory, such as TCAM or SRAM, which is expensive and usually available in small capacities. THE DAIET APPROACHIn this work, we propose DAIET, a system for data aggregation in-network. DAIET leverages the programmable data plane to reduce the traffic as it is being forwarded towards the destination by * Amedeo Sapio is also with Politecnico di Torino.Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). SoCC '17, September 24-27, 2017, Santa Clara, CA, USA © 2017 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-5028-0/17/09. https://doi.org/10.1145/3127479.3132018 opportunistically offloading the aggregation task to the network. In many distributed algorithms, the aggregation function is typically commutative and associative. Therefore, each network device can independently aggregate part of the data without affecting the correctness of the result. Moreover, the destination workers remain in charge of the portion of the aggregation task that is not handled by the network.Since these applications typically exchange the intermediate results with many-to-one communications, DAIET models this pattern using several in-network aggregation trees, where the r...

show abstract

“…Languages such as P4 [27] are emerging as a way to express such matchaction processing in a hardware-independent manner.…”

Section: Introductionmentioning

confidence: 99%

Packet Transactions

Sivaraman¹,

Cheung

Budiu

et al. 2016

Proceedings of the 2016 ACM SIGCOMM Conference

Self Cite

214

View full text Add to dashboard Cite

Many algorithms for congestion control, scheduling, network measurement, active queue management, and traffic engineering require custom processing of packets in the data plane of a network switch. To run at line rate, these dataplane algorithms must be implemented in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built. This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chips. The key challenge is that many data-plane algorithms create and modify algorithmic state. To achieve line-rate programmability for stateful algorithms, we introduce the notion of a packet transaction: a sequential packet-processing code block that is atomic and isolated from other such code blocks.We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated chip-area overhead.

show abstract

P4

Cited by 2,091 publications

References 12 publications

Programmable Packet Scheduling at Line Rate

Programmable Packet Scheduling at Line Rate

Daiet

Packet Transactions

Contact Info

Product

Resources

About