Network traffic classification is a vital task for service operators, network engineers, and security specialists to manage network traffic, design networks, and detect threats. Identifying the type/name of applications that generate traffic is a challenging task as encrypting traffic becomes the norm for Internet communication. Therefore, relying on conventional techniques such as deep packet inspection (DPI) or port numbers is not efficient anymore. This paper proposes a novel flow statistical-based set of features that may be used for classifying applications by leveraging machine learning algorithms to yield high accuracy in identifying the type of applications that generate the traffic. The proposed features compute different timings between packets and flows. This work utilises tcptrace to extract features based on traffic burstiness and periods of inactivity (idle time) for the analysed traffic, followed by the C5.0 algorithm for determining the applications that generated it. The evaluation tests performed on a set of real, uncontrolled traffic, indicated that the method has an accuracy of 79% in identifying the correct network application.
Classifying Internet traffic into applications is vital to many areas, from quality of service (QoS) provisioning, to network management and security. The task is challenging as network applications are rather dynamic in nature, tend to use a web front-end and are typically encrypted, rendering traditional port-based and deep packet inspection (DPI) method unusable. Recent classification studies proposed two alternatives: using the statistical properties of traffic or inferring the behavioural patterns of network applications, both aiming to describe the activity within and among network flows in order to understand application usage and behaviour. The aim of this paper is to propose and investigate a novel feature to define application behaviour as seen through the generated network traffic by considering the timing and pattern of user events during application sessions, leading to an extended traffic feature set based on burstiness. The selected features were further used to train and test a supervised C5.0 machine learning classifier and led to a better characterization of network applications, with a traffic classification accuracy ranging between 90-98%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.