In this paper we present a novel architecture for high-speed and high-capacity regular expression matching (REM) on FPGA. The proposed REM architecture, based on nondeterministic finite automaton (RE-NFA), efficiently constructs regular expression matching engines (REME) of arbitrary regular patterns and character classes in a uniform structure, utilizing both logic slices and block memory (BRAM) available on modern FPGA devices. The resulting circuits take advantage of synthesis and routing optimizations to achieve high operating speed and area efficiency. The uniform structure of our RE-NFA design can be stacked in a simple way to produce multi-character input circuits to scale up throughput further. An n-state m-character input REME takes only O (n × log 2 m) time to construct and occupies no more than O (n × m) logic units. The REMEs can be staged and pipelined in large numbers to achieve high parallelism without sacrificing clock frequency.Using the proposed RE-NFA architecture, we are able to implement 3 copies of two-character input REMEs, each with 760 regular expressions, 18715 states and 371 character classes, onto a single Xilinx Virtex 4 LX-100-12 device. Each copy processes 2 characters per clock cycle at 300 MHz, resulting in a concurrent throughput of 14.4 Gbps for 760 REMEs. Compared with the automatic NFA-to-VHDL REME compilation [13], our approach achieves over 9x throughput efficiency (Gbps*state/LUT). Compared with state-of-the-art REMEs on FPGA, our approach also indicates up to 70% better throughput efficiency.
Abstract-We present the design, implementation and evaluation of a high-performance architecture for regular expression matching (REM) on field-programmable gate array (FPGA). Each regular expression (regex) is first parsed into a concise token list representation, then compiled to a modular nondeterministic finite automaton (RE-NFA) using a modified version of the McNaughton-Yamada algorithm. The RE-NFA can be mapped directly onto a compact register-transistor level (RTL) circuit. A number of optimizations are applied to improve the circuit performance: (1) spatial stacking is used to construct an REM circuit processing m ≥ 1 input characters per clock cycle; (2) single-character constrained repetitions are matched efficiently by parallel shift-register lookup tables; (3) complex character classes are matched by a BRAM based classifier shared across regexes; (4) a multi-pipeline architecture is used to organize a large number of RE-NFAs into priority groups to limit the I/O size of the circuit. We implemented 2,630 unique PCRE regexes from Snort rules (February 2010) in the proposed REM architecture. Based on the place-and-route results from Xilinx ISE 11.1 targeting Virtex 5 LX-220 FPGAs, the proposed REM architecture achieved up to 11 Gbps concurrent throughput for various regex sets and up to 2.67x the throughput efficiency of other state-of-the-art designs.
Abstract-We propose a pipelined field-merge architecture for memory-efficient and high-throughput large-scale string matching (LSSM). Our proposed architecture partitions the (8-bit) character input into several bit-field inputs of smaller (usually 2-bit) widths. Each bit-field input is matched in a partial state machine (PSM) pipeline constructed from the respective bit-field patterns. The matching results from all the bit-fields in every pipeline stage are then merged with the help of an auxiliary table (ATB). This novel architecture essentially divides the LSSM problem with a continuous stream of input characters into two disjoint and simpler sub-problems: 1) O (character_bitwidth) number of pipeline traversals, and 2) O (pattern_length) number of table lookups. It is naturally suitable for implementation on FPGA or VLSI with on-chip memory. Compared to the bit-split approach [12], our field-merge implementation on FPGA requires 1/5 to 1/13 the total memory while achieving 25% to 54% higher clock frequencies.
Abstract-Regular expression matching (REM) with nondeterministic finite automata (NFA) can be computationally expensive when a large number of patterns are matched concurrently. On the other hand, converting the NFA to a deterministic finite automaton (DFA) can cause state explosion, where the number of states and transitions in the DFA are exponentially larger than in the NFA. In this paper, we seek to answer the following question: to match an arbitrary set of regular expressions, is there a finite automaton that lies between the NFA and DFA in terms of computation and memory complexities? We introduce the semideterministic finite automata (SFA) and the state convolvement test to construct an SFA from a given NFA. An SFA consists of a fixed number (p) of constituent DFAs (c-DFA) running in parallel; each c-DFA is responsible for a subset of states in the original NFA. To match a set of regular expressions with n overlapping symbols (that can match to the same input character concurrently), the NFA can require O(n) computation per input character¸whereas the DFA can have a state transition table with O(2 n ) states. By exploiting the state convolvements during the SFA construction, an equivalent SFA reduces the computation complexity to O(p 2 /c 2 ) per input character while limiting the space requirement to O(p 2 × (n/p) c ) states, where c ≥ 1 is a small design constant. Although the problem of constructing the optimal (minimum-sized) SFA is shown to be NP-complete, we develop a greedy heuristic to quickly construct a near-optimal SFA in time and space quadratic in the number of states in the original NFA. We demonstrate our SFA construction using real-world regular expressions taken from the Snort IDS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.