Abstract-In this paper, we introduce FlowSifter, a systematic framework for online application protocol field extraction. FlowSifter introduces a new grammar model Counting Regular Grammars (CRG) and a corresponding automata model Counting Automata (CA). The CRG and CA models add counters with update functions and transition guards to regular grammars and finite state automata. These additions give CRGs and CAs the ability to parse and extract fields from context sensitive application protocols. These additions also facilitate fast and stackless approximate parsing of recursive structures. These new grammar models enable FlowSifter to generate optimized Layer 7 field extractors from simple extraction specifications. In our experiments, we compare FlowSifter against both BinPAC and UltraPAC, which are the freely available state of the art field extractors. Our experiments show that when compared to UltraPAC parsers, FlowSifter extractors run 84% faster and use 12% of the memory.
Abstract-Regular expression (RE) matching is a core component of deep packet inspection in modern networking and security devices. In this paper, we propose the first hardware-based RE matching approach that uses Ternary Content Addressable Memory (TCAM), which is available as off-the-shelf chips and has been widely deployed in modern networking devices for tasks such as packet classification. We propose three novel techniques to reduce TCAM space and improve RE matching speed: transition sharing, table consolidation, and variable striding. We tested our techniques on 8 real-world RE sets, and our results show that small TCAMs can be used to store large Deterministic Finite Automata (DFAs) and achieve potentially high RE matching throughput. For space, we can store each of the corresponding 8 DFAs with 25,000 states in a 0.59Mb TCAM chip. Using a different TCAM encoding scheme that facilitates processing multiple characters per transition, we can achieve potential RE matching throughput of 10 to 19 Gbps for each of the 8 DFAs using only a single 2.36 Mb TCAM chip.
In this paper, we propose FlowSifter, a framework for automated online application protocol field extraction. FlowSifter is based on a new grammar model called Counting Regular Grammars (CRG) and a corresponding automata model called Counting Automata (CA). The CRG and CA models add counters with update functions and transition guards to regular grammars and finite state automata. These additions give CRGs and CAs the ability to parse and extract fields from context sensitive application protocols. These additions also facilitate fast and stackless approximate parsing of recursive structures. These new grammar models enable FlowSifter to generate optimized Layer 7 field extractors from simple extraction specifications. We compare FlowSifter against both BinPAC and UltraPAC, which represent the state-of-the-art field extractors. Our experiments show that when compared to BinPAC parsers, FlowSifter runs more than 21 times faster and uses 49 times less memory. When compared to UltraPAC parsers, FlowSifter extractors run 12 times faster and use 24 times less memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.