Scalable Many-Field Packet Classification on Multi-core Processors

Qu, Yun R.; Zhou, Shijie; Prasanna, Viktor K.

doi:10.1109/sbac-pad.2013.29

Cited by 22 publications

(28 citation statements)

References 23 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Bit Vector (BV) [4] is a data structure used for merging the partial matching results. For a rule set consisting of N rules, a partial matching result from a specific field is a BV of N bits, each bit corresponding to a particular rule in the III.…”

Section: Related Workmentioning

confidence: 99%

“…, M − 1; (2) use a hash value i as the index to access the hash table T and choose another set of M hash functions. We do not introduce the range-tree and the hashing mechanism in detail, since they have been extensively studied [4], [20].…”

Section: A Preprocessmentioning

confidence: 99%

“…We use the same hash-based search techniques as in [4]. Note only one BV is extracted for each field in the search phase.…”

Section: B Search and Mergementioning

confidence: 99%

“…To implement our packet classification engine, we can assign each core a single packet header field; however, the communication overhead between different cores during the merge phase limits the performance of this implementation [4]. Therefore, in our implementation, each packet header of W -fields stays in a single core.…”

Section: Performance Modelmentioning

confidence: 99%

“…As shown in Table III, Vprty (3-bit), MPLS tfc (3-bit) and ToS fields (6-bit) are short fields, the maximum number of unique rules in these fields is very small; therefore the numbers of unique rules in these fields are saturated (ρ (k) = 1). Increasing the number of unique rules is similar to enlarging the original rule set [4], which has a negative effect on the performance (Section V-C). To reduce the number of unique rules after grouping, we only group those (short fields) containing a small number of bits: Eth type, MPLS lbl, and ToS are grouped as one superfield, while Ingr, VID, Vprty, Prtl, and MPLS tfc are grouped as another superfield.…”

Section: A Experimental Setupmentioning

confidence: 99%

See 4 more Smart Citations

Performance modeling and optimizations for decomposition-based large-scale packet classification on multi-core processors

Zhou

Prasanna

2014

2014 IEEE 15th International Conference on High Performance Switching and Routing (HPSR)

Self Cite

View full text Add to dashboard Cite

Abstract-Large-scale packet classification such as OpenFlow table lookup in Software Defined Networking (SDN) is a key task performed at the Internet routers. However, the increasing size of the rule set and the increasing width of each individual rule make large-scale packet classification a challenging problem. In this paper, we present a decompositionbased approach for large-scale packet classification on multicore processors. We develop a model to predict the performance of the classification engine with respect to throughput and latency. This model involves the architectural parameters of the multi-core processors and the design requirements of packet classification. Based on this model, we employ optimization techniques such as grouping short fields in the search phase and early termination of the merge phase. The performance model can be applied to other generic multi-field classification problems as well. To evaluate the accuracy of the performance model, we implement a 15-field classification engine on stateof-the-art multi-core processors. Experimental results show that, the proposed model predicts the performance with less than ±10% error. For a 32 K 15-field rule set, the optimized decomposition-based approach achieves 2000 ns per packet latency and 33 Million Packets Per Second (MPPS) throughput (49% of the peak throughput). The peak performance assumes an ideal execution model that uses an optimized execution sequence and ignores memory access latency, data dependencies, and context switch overhead.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: A Preprocessmentioning

confidence: 99%

“…We use the same hash-based search techniques as in [4]. Note only one BV is extracted for each field in the search phase.…”

Section: B Search and Mergementioning

confidence: 99%

Section: Performance Modelmentioning

confidence: 99%

Section: A Experimental Setupmentioning

confidence: 99%

See 3 more Smart Citations

Performance modeling and optimizations for decomposition-based large-scale packet classification on multi-core processors

Zhou

Prasanna

2014

2014 IEEE 15th International Conference on High Performance Switching and Routing (HPSR)

Self Cite

View full text Add to dashboard Cite

show abstract

Pre-processing Algorithm for Rule Set Optimization Throughout Packet Classification in Network Systems

Kumar

Ramasubramanian

2017

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors

Zhou

Prasanna

2014

Int J Parallel Prog

Self Cite

View full text Add to dashboard Cite

As a kernel function in network routers, packet classification requires the incoming packet headers to be checked against a set of predefined rules. There are two trends for packet classification: (1) to examine a large number of packet header fields, and (2) to use software-based solutions on multi-core general purpose processors and virtual machines. Although packet classification has been widely studied, most existing solutions on multi-core systems target the classic 5-field packet classification; it is not easy to scale up their performance with respect to the number of packet header fields. In this work, we present a decomposition-based packet classification approach; it supports large rule sets consisting of a large number of packet header fields. In our approach, range-tree and hashing are used to search the fields of the input packet header in parallel. The partial results from all the fields are represented in rule ID sets; they are merged efficiently to produce the final match result. We implement our approach and evaluate its performance with respect to overall throughput and processing latency for rule set size varying from 1 to 32 K. Experimental results on state-of-the-art 16-core platforms show that, an overall throughput of 48 million packets per second and a processing latency of 2,000 ns per packet can be achieved for a 32 K rule set.

show abstract

Scalable Many-Field Packet Classification on Multi-core Processors

Cited by 22 publications

References 23 publications

Performance modeling and optimizations for decomposition-based large-scale packet classification on multi-core processors

Performance modeling and optimizations for decomposition-based large-scale packet classification on multi-core processors

Pre-processing Algorithm for Rule Set Optimization Throughout Packet Classification in Network Systems

A Decomposition-Based Approach for Scalable Many-Field Packet Classification on Multi-core Processors

Contact Info

Product

Resources

About