Hoang Le scite author profile

2011

Router virtualization has recently gained much interest in the research community. It allows multiple virtual router instances to run on a common physical router platform. The key metrics in designing network virtual routers are: (1) number of supported virtual router instances, (2) total number of prefixes, and (3) ability to quickly update the virtual table. Limited on-chip memory in FPGA leads to the need for memory-efficient merging algorithms. On the other hand, due to high frequency of combined updates from all the virtual routers, the merging algorithms must be highly efficient. Hence, the router must support quick updates. In this paper, we propose a simple merging algorithm whose performance is not sensitive to the number of routing tables considered. The performance solely depends on the total number of prefixes. We also propose a novel scalable, highthroughput linear pipeline architecture for IP-lookup that supports large virtual routing tables and quick non-blocking update. Using a state-of-the-art Field Programmable Gate Array (FPGA) along with external SRAM, the proposed architecture can support up to 16M IPv4 and 880K IPv6 prefixes. Our implementation shows a sustained throughput of 400 million lookups per second, even when external SRAM is used.

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Jiang

2008

Scalable Tree-Based Architectures for IPv4/v6 Lookup Using Prefix Partitioning

2012

IEEE Trans. Comput.

Abstract-Memory efficiency and dynamically updateable data structures for Internet Protocol (IP) lookup have regained much interest in the research community. In this paper, we revisit the classic tree-based approach for solving the longest prefix matching (LPM) problem used in IP lookup. In particular, we target our solutions for a class of large and sparsely-distributed routing tables, such as those potentially arising in the next-generation IPv6 routing protocol. Due to longer prefix lengths and much larger address space, preprocessing such routing tables for tree-based LPM can significantly increase the number of prefixes and/or memory stages required for IP lookup. We propose a prefix partitioning algorithm (DPP) to divide a given routing table into k groups of disjoint prefixes (k is given). The algorithm employs dynamic programming to determine the optimal split lengths between the groups to minimize the total memory requirement. Our algorithm demonstrates a substantial reduction in the memory footprint compared with those of the state-of-the-art in both IPv4 and IPv6 cases. Two proposed linear pipelined architectures, which achieve high throughput and support incremental updates, are also presented. The proposed algorithm and architectures achieve a memory efficiency of 1 byte of memory for each byte of prefix for both IPv4 and IPv6. As a result, our design scales well to support either larger routing tables, longer prefix lengths, or both. The total memory requirement depends solely on the number of prefixes. Implementations on 45 nm ASIC and a state-of-the-art FPGA device (for a routing table consisting of 330K prefixes) show that our algorithm achieves 980 and 410 million lookups per second, respectively. These results are well suited for 100Gbps lookup. The implementations also scale to support larger routing tables and longer prefix length when we go from IPv4 to IPv6. Additionally, the proposed architectures can easily interface with external SRAMs to ease the limitation of on-chip memory of the target devices.

Scalable High Throughput and Power Efficient IP-Lookup on FPGA

2009

A Memory-Efficient and Modular Approach for Large-Scale String Pattern Matching

2013

IEEE Trans. Comput.

Abstract-In Network Intrusion Detection Systems (NIDSs), string pattern matching demands exceptionally high performance to match the content of network traffic against a predefined database (or dictionary) of malicious patterns. Much work has been done in this field; however, most of the prior work results in low memory efficiency (defined as the ratio of the amount of the required storage in bytes and the size of the dictionary in number of characters). Due to such inefficiency, state-of-the-art designs cannot support large dictionaries without using high-latency external DRAM. We propose an algorithm called "leaf-attaching" to preprocess a given dictionary without increasing the number of patterns. The resulting set of post-processed patterns can be searched using any tree-search data structure. We also present a scalable, high-throughput, Memory efficient Architecture for large-scale String Matching (MASM) based on a pipelined binary search tree. The proposed algorithm and architecture achieve a memory efficiency of 0.56 (for the Rogets dictionary) and 1.32 (for the Snort dictionary). As a result, our design scales well to support larger dictionaries. Implementations on 45 nm ASIC and a state-of-the-art FPGA device (for latest Rogets and Snort dictionaries) show that our architecture achieves 24 Gbps and 3.2 Gbps, respectively. The MASM module can simply be duplicated to accept multiple characters per cycle, leading to scalable throughput with respect to the number of characters processed in each cycle. Dictionary update involves simply rewriting the content of the memory, which can be done quickly without reconfiguring the chip.