A network protocol defines rules that control communications between two or more machines on the Internet, whereas Automatic Protocol Reverse Engineering (APRE) defines the way of extracting the structure of a network protocol without accessing its specifications. Enough knowledge on undocumented protocols is essential for security purposes, network policy implementation, and management of network resources. This paper reviews and analyzes a total of 39 approaches, methods, and tools towards Protocol Reverse Engineering (PRE) and classifies them into four divisions, approaches that reverse engineer protocol finite state machines, protocol formats, and both protocol finite state machines and protocol formats to approaches that focus directly on neither reverse engineering protocol formats nor protocol finite state machines. The efficiency of all approaches’ outputs based on their selected inputs is analyzed in general along with appropriate reverse engineering inputs format. Additionally, we present discussion and extended classification in terms of automated to manual approaches, known and novel categories of reverse engineered protocols, and a literature of reverse engineered protocols in relation to the seven layers’ OSI (Open Systems Interconnection) model.
Summary Recently, network traffic has become more complex and diverse because of the emergence of new applications and services. Therefore, the importance of application‐level traffic classification is increasing rapidly, and it has become a very popular research area. Although a lot of methods for traffic classification have been introduced in literature, they have some limitations to achieve an acceptable level of performance in real‐time application‐level traffic classification. In this paper, we propose a novel application‐level traffic classification method using payload size sequence signature. The proposed method generates unique payload size sequence signatures for each application using packet order, direction, and payload size of the first N packets in a flow and uses them to identify application traffic. The evaluation shows that this method can classify application traffic easily and quickly with high accuracy and completeness rates, over 99.93% and 93.45%, respectively. Furthermore, the method can classify each application traffic into its respective individual application. The evaluation shows that the method can classify all applications traffic, known and unknown (new) applications into their respective applications, and it can classify applications traffic that use the same application protocol or are encrypted into each other.
Summary With the rapid development of the internet and a vigorous emergence of new applications, traffic identification has become a key issue. Although various methods have been proposed, there are still several limitations to achieving fine‐grained and application‐level identification. Therefore, we previously proposed a behavior signature model for extracting a unique traffic pattern of an application. Although this signature model achieves a good identification performance, it has trouble with the signature extraction, particularly from a huge amount of input traffic, because a Candidate‐Selection method is used for extracting the signature. To improve this inefficiency in the extraction process, in this paper, we propose a novel behavior signature extraction method using a sequence pattern algorithm. The proposed method can extract a signature regardless of the volume of input traffic because it excludes certain unsatisfactory candidates using a predefined support value during the early stage of the process. We proved experimentally the feasibility of the proposed extraction method for 7 popular applications.
Summary Various traffic identification methods have been proposed with the focus on application‐level traffic analysis. Header signature–based identification using the 3‐tuple (Internet Protocol address, port number, and L4 protocol) within a packet header has garnered a lot of attention because it overcomes the limitations faced by the payload‐based method, such as encryption, privacy concerns, and computational overhead. However, header signature–based identification does have a significant flaw in that the volume of header signatures increases rapidly over time as a number of applications emerge, evolve, and vanish. In this article, we propose an efficient method for header signature maintenance. Our approach automatically constructs header signatures for traffic identification and only retains the most significant signatures in the signature repository to save memory space and to improve matching speed. For the signature maintenance, we define a new metric, the so‐called signature weight, that reflects its potential ability to identify traffic. Signature weight is periodically calculated and updated to adapt to the changes of network environment. We prove the feasibility of the proposed method by developing a prototype system and deploying it in a real operational network. Finally, we prove the superiority of our signature maintenance method through comparison analysis against other existing methods on the basis of various evaluation metrics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.